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1  Summary 


During  the  period  1  July  1984  -  31  January  1990,  the  research  under  Grant  AFOSR-84-0181  has 
been  concerned  with  binary  parallel  optical  computing  architectures  with  particular  attention 
to  cellular  logic  and  symbolic  substitution  for  pattern  recognition  and  numerical  operations. 
Our  approach  has  been  to  experimentally  implement  binary  optical  cellular  logic  processors  and 
interconnection  arrays;  define  an  instruction  set  and  software  suited  to  optical  computing  systems; 
and  to  study  generalizations  of  optical  cellular  logic  processors  such  as  the  cellular  hypercube.  The 
results  include  the  experimental  implementation  of  a  54-gate  binary  optical  cellular  logic  processor 
with  instruction  decoders,  input/output,  memory  and  test/branch  functions;  the  completion  of  a 
binary  image  algebra  (BIA)  description  of  cellular  logic,  image  analysis  and  symbolic  operations; 
and  the  development  of  binary  image  algebra  algorithms  for  scale  and  shift  invariant  pattern 
recognition.  Additional  work  concerns  the  relationship  of  parallel  computation  paradigms  to 
optical  computing  and  halftone  screen  techniques  for  implementing  general  nonlinear  functions. 
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2  Research  Progress 


This  section  summarizes  research  progress  and  accomplishments  for  the  period  1  July  1984  - 
31  January  1990  on  Grant  AFOSR-84-0181  for  Nonlinear  Real-Time  Optical  Signal  Processing. 
These  results  are  discussed  separately  in  the  sections  that  follow. 

2.1  Digital  Optical  Parallel  Computing  Systems 

We  have  continued  work  on  an  experimental  sequential  optical  binary  parallel  architecture  that  is 
constructed  from  an  array  of  binary  optical  switching  elements  (NOR  gates)  with  interconnections 
done  by  a  computer-generated  hologram.  We  examined  new  binary  array  spatial  light  modulators 
(SLM’s),  high  efficiency,  high  space-bandwidth  product  (SBWP)  interconnection  holograms,  and 
compact  reflection  versions  of  the  general  architecture  with  the  intent  of  building  a  larger  demon¬ 
stration  system  with  greater  capabilities.  We  have  studied  improved  methods  of  providing  the 
interconnections  in  these  systems  by  the  use  of  hybrid  digital /analog  (facet)  holograms.  A  final 
area  of  study  has  been  to  examine  in  detail  algorithms  that  are  well-suited  for  implementation 
on  the  parallel  binary  architectures  described  previously.  We  have  defined  several  methods  for 
building  binary  and  arithmetic  cellular  logic  processors  and  have  determined  some  limits  due  to 
hologram  complexity,  gate  density,  etc. 

A  reprint  describing  the  general  types  of  system  under  consideration  is  included  for  reference, 
the  paper,  by  B.K.  Jenkins  and  A. A.  Sawchuk,  is  “Binary  Optical  Computing  Architectures”, 
Optics  News,  Vol.  12,  No.  4,  pp.  25-26,  1986. 
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imagine  two  possible  paths  for  this  development:  in 
evolution  or  a  revolution.  / 

\  For  the  moment,  optical  components  are  usedipar- 
ingly  in  predominantly  electronic  computers.  TKe  op¬ 
erating  mode  of  the  optical  devices  is  dictated^  the 
exLting  structure  of  die  computer  which  in  Aim  has 
been  shaped  by  the  characteristics  of  semiconductor 
devices.  As  optics  is  increasingly  used  in  Awnputers, 
it  sml  perhaps  become  apparent  that  dyf  traditional 
advantages  of  optical  computing  can  hi  used  to  en- 
hanceihe  capability  of  these  optical  components. 

For  instance,  if  optical  memories  broome  common¬ 
place,  it  Vill  be  inevitable  that  parallel  access  of  these 
memories  will  be  attempted  in  osfler  to  get  to  the 
stored  infohnation  Cuter,  which  will  probably  create 
a  communication  bottleneck  in  aAraditional  comput¬ 
ing  stnictureVrhis  may  bring  about  the  need  for  more 
extensive  global  communication  between  the  memo¬ 
ry  and  the  processing  elements  in  the  computer,  and 
this,  in  turn,  mav  require  an  Optical  solution  for  the 
communication  ptoblem  that  flie  optical  memory  cre¬ 
ated.  \  / 

Evolutionary  processes  of  this  type  are  certain  to 
occur.  The  only  question  ism ow  feu  diey  will  go.  Will 
enough  optical  devicek  and  optical  techniques  even¬ 
tually  be  inserted  in  amefectronic  computer  to  allow 
us  to  call  this  machine  Jbn  optical  computer  rather 
than  an  electronic  eomouW  with  some  optical  com¬ 
ponents?  /  \ 

I  don't  think  the  answer  to  this  question  is  impor¬ 
tant  What  is  important  is  theVealization  that  if  optical 
components  are  usedf  in  computers,  then  there  is  a 
tremendous  potential  for  improvement  in  die  perfor- 


tremendous  potential  for  improvement  in  the  perfor¬ 
mance  through  “optical  computing”  ideas  and  tech¬ 
niques.  I  \ 

The  evolutionary  path  to  optical  computing,  as  out¬ 
lined  above,  is  something  that  whs  not  planned  by 
anybody.  Most  of  us  working  on  optical  computers 
imagine  their  development  as  a  revolution;  we  start 
with  a  new  technology  (optics),  and  find  new  ways  to 
solve  problems  that  are  suited  to  this  technology,  and 
we  find  a  new  set  of  important  problems  to  solve  that 
present  computers  have  a  hard  time  with.  The  suc¬ 
cessful  patMowards  the  development  oreuch  radical¬ 
ly  new  computers  is,  needless  to  say,  notaertain  and 
not  yet  universally  agreed  upon.  \ 

In  order  to  be  competitive  with  the  Veil-en¬ 
trenched  semiconductor  technology,  it  is  important 
that  w 4  identify  clearly  die  comparatively  advanta¬ 
geous  features  of  optics  and  try  to  make  the  bw  use 
possible  of  them.  Global  communication  and  due  ca¬ 
pacity  for  dense  storage  of  information  are  two  oftthe 
strong  suits  of  optics.  We  ought  to  be  able  to  find 
ways  to  store  large  amounts  of  information  optically, 
and  also  have  ways  to  retrieve  this  information  very 
quickly  using  optical  communication.  1 


.  If  we  do  this  successfully,  then  we  are  left  with  tKe 
ifuestion  of  how  to  process  all  this  information  wV  are 
removing.  The  possible  answers  may  lie  in  the  area 
of  nehral  network  models.  / 

A  neural  network  is  a  very  large  collection  of  neu¬ 
rons  (perhaps  100  billion  of  diem),  widi  each  neuron 
being  connected  to  thousands  of  odiersyrhe  capabili¬ 
ty  to  store  large  amounts  of  information  and  the  abili¬ 
ty  to  get  to  thirmformation  quickly  and  efficiendy  are 
what  give  neura^networks  their  computational  pow¬ 
er.  This  is  accomplished  by  storufg  the  information  in 
the  interconnectionkamong  the  neurons,  which  is  a 
most  ingenious  way  th.  avoid  bottlenecks  in  trying  to 
get  to  the  stored  informktioi/ 

The  problems  that  neural  networks  are  particularly 
good  at  (recognition  of  irjskjtos  and  speech,  classifica¬ 
tion  of  patterns,  and  associations)  are  problems  that 
electronic  computers  are  particularly  poor  at  solving. 
There  is  an  excellent match  between  the  capabilities 
and  strengths  of  optics  and  neurahnetworks.  There¬ 
fore,  if  we  can  obtain  some  insights  iron*  the  work  of 
biologists  and  others  who  study  neuralmetworks.  this 
can  prove  very /helpful  in  our  effort  to  design  optical 
computers.  /  \ 

A  natural  Compatibility  has  emerged  betiveen  the 
requirements  of  problems  typically  encountered  in 
artificial  intelligence,  die  ways  that  such  problems  - 
are  solved  by  a  neural  network,  and  the  capabilities » r 
opticsyThe  development  of  optical  computers  that  € 
ploiythis  compatibility  is  one  of  the  exciting  ana. 
promising  prospects  for  the  future  of  the  field.  \ 
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Binary  parallel  optical  computing  architectures 
differ  greatly  from  traditional  optical  analog 
and  numerical  processors.  The  potential  advan¬ 
tages  of  these  architecutres  are: 

■  They  offer  flexibility  of  operations — numerical, 
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Binary  parallel  optical  computing 
architectures  differ  greatly  from 
traditional  optical  analog  and 
numerical  processors:  they  offer  the 
possibility  of  high  throughput  and 
processing  speed  with  arrays  of  fast 
optical  switches  that  are  being 
developed. 


symbolic  or  logical,  compared  to  analog  or  discrete 
multilevel  processors; 

■  They  have  binary  digital  accuracy  and  dynamic 
range; 

■  They  offer  computing  architectures  very  different 
from  electronic  very  large  scale  integration  (VLSI): 
they  permit  global  interconnections  and  parallel  in¬ 
put-output  compared  to  the  local  interconnections, 
clock-skew  problems,  pin-in/pin-out  and  bus  limita¬ 
tions  of  very  large  scale  integration  (VLSI); 

■  They  utilize  the  2-D  parallel  nature  of  optical  de¬ 
vice  arrays  and  low  interaction  of  optical  signals  for 
interconnections  in  3-D;  and 

■  They  offer  the  possibility  of  high  throughout  and 
processing  speed  with  arrays  of  fast  optical  switches 
that  are  being  developed. 


interconnection  unit 
(computer  generoted  hotogrom) 


An  optical  digital  computing  architecture  with  global 
interconnection!. 


The  figure  shows  one  concept  of  an  optical  sequen¬ 
tial  logic  system  with  global  interconnections.  An  ex¬ 
perimental  system  based  on  this  concept  has  been 
built  The  gate  array  at  the  bottom  is  a  2-D  array  of 
optical  NOR  gates  formed  on  tire  surface  of  a  liquid- 
crystal  light  valve  or  other  spatial  light  modulator  hav¬ 
ing  an  array  of  optically  activated  switching  elements. 
The  light  valve  operates  in  transmission,  so  that  the 
inputs  are  applied  on  the  left  side  of  the  device  (hid¬ 
den  in  the  figure),  while  the  outputs  are  obtained  on 
the  right  side  of  tire  device. 

With  this  arrangement,  optical  signal  inputs  and 
outputs  are  accessible  in  parallel.  An  interconnection 
unit  (which  is  currently  a  computer  generated  holo¬ 
gram)  connects  gate  outputs  to  gate  inputs  in  a  very 
general  way  and  forms  the  “wiring."  The  gate  inter¬ 
connections  can  be  global  or  local  with  equal  ease, 
because  a  third  dimension  is  utilized  and  optical  sig¬ 
nals  can  propagate  through  each  other  with  minimal 
effective  interaction.  Components  such  as  flip-flops, 
registers,  memory,  instruction  decoders,  arithmetic 
logic  units,  and  central  processing  units  are  defined 
by  a  fixed  wiring  pattern.  The  resulting  machine 
could,  in  principle,  be  general-purpose  or  special- 
purpose,  and  could  be  programmable. 

The  interconnection  wiring  can  be  altered  by 
changing  the  hologram;  thus  a  binary  optical  comput¬ 
er  with  dynamically  reconfigurable  wiring  is  a  possi¬ 
bility.  A  wide  variety  of  hologram  encoding  tech¬ 
niques  can  be  utilized,  and  the  gate  array  may  contain 
=  105  or  106  gates.  One  important  aspect  of  this  system 
is  that  it  can  be  configured-  as  a  binary  parallel  com¬ 
puter,  which  is  very  different  from  traditional  archi¬ 
tectures. 

An  experimental  version  of  the  system  in  the  figure 
has  been  implemented.  The  system  is  an  all-optical 
16-gate  digital  sequential  circuit,  including  clock  and 
flip-flop,  implemented  using  a  Hughes  liquid-crystal 
light  valve.  The  system  contains  a  high-resolution 
computer-generated  hologram  for  gate  interconnec¬ 
tion,  which  was  made  on  an  electron-beam  integrated 
circuit  mask  writer.  Many  different  optical  systems 
using  different  types  of  computer-generated  holo¬ 
grams  can  be  used  for  the  interconnections;  current 
research  is  concerned  with  comparing  various  alter¬ 
natives  and  improving  hologram  resolution  and  flexi¬ 
bility. 

Several  computationally  demanding  practical  prob¬ 
lems  such  as  parallel  digital  image  processing  and  im¬ 
age  analysis  are  well-matched  to  this  architecture.  In 
the  future,  a  large  (=106  gates)  2-D  array  of  binary 
switching  (threshold)  or  bistable  devices,  preferably 
all-optical  (optical  input  and  output)  could  be  used  in 
the  system,  which  could  provide  nanosecond  switch¬ 
ing  times.  Many  alternative  technologies  exist;  they 
must  be  compared  and  evaluated. 
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2.2  Optical  Cellular  Logic  Processors 


We  have  continued  work  on  optical  cellular  logic  processors  (CLP’s)  and  other  parallel  digital 
processing  architectures  that  are  well-suited  for  implementation  on  the  sequential  optical  archi¬ 
tecture  described  in  the  previous  section.  Optical  CLP’s  are  well  matched  to  this  architecture 
because  they  are  a  2-D,  page  oriented  array  of  individual  processors  located  at  every  pixel  of  a 
large  image.  The  attached  paper  by  B.K.  Jenkins  and  A. A.  Sawchuk,  “Optical  Cellular  Logic 
Architectures  for  Image  Processing”,  from  IEEE  Computer  Society  Workshop  on  Computer  Ar¬ 
chitectures  for  Pattern  Analysis  and  Image  Database  Management,  November  1985,  summarizes 
some  of  these  concepts.  Work  in  progress  includes  studies  on  the  implementation  of  cellular  hy¬ 
percubes  and  pyramids,  which  are  not  feasible  for  electronic  VLSI,  but  offer  important  advantages 
for  improved  image  processing. 
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OPTICAL  CELLULAR  LOGIC  ARCHITECTURES  FOR  IMAGE  PROCESSING 
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Electrical  Engineering  -  Systems  Department 
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Los  Angeles,  California  90089-0272 


ABSTRACT 

A  digital  optical  processing  system  consisting  of  optical 
gates  and  optical  interconnections  is  described.  Its  concept 
has  been  demonstrated  experimentally.  The  implementation 
of  an  optical  cellular  logic  processor  is  considered.  Optical 
systems  for  interconnections  are  given,  and  the  architectural 
characteristics  of  such  optical  cellular  logic  machines  are  dis¬ 
cussed  and  compared  with  electronic  machines. 

I.  INTRODUCTION 

In  the  optical  processing  community  there  has  been  a 
substantial  amount  of  research  in  the  area  of  optical  image 
processing  and  pattern  recognition.  Most  of  the  systems  to 
date  have  been  analog,  and  have  the  potential  of  processing 
large  amounts  of  data  in  parallel  and  at  high  speeds.  How¬ 
ever,  their  analog  nature  limits  both  the  accuracy  and  the 
range  of  operations  that  can  be  achieved  with  a  given  sys¬ 
tem.  A  digital  optical  system  holds  the  promise  of  alleviat¬ 
ing  these  restrictions  while  maintaining  some  of  the  advan¬ 
tages  of  optical  systems.  A  cellular  logic  processor  is  one 
possible  use  of  such  a  digital  optical  computer. 

The  current  interest  in  digital  optical  computing  can  be 
largely  attributable  to  two  developments:  (1)  recent  progress 
in  optical  materials  and  devices  useful  for  the  implementa¬ 
tion  of  gates,  including  improvements  in  size,  potential 
manufacturability,  cascadability,  and  especially  switching 
energy;  and  (2)  the  realization  that  optical  systems  could 
have  significant  advantages  over  electronic  computers  in  cer¬ 
tain  application  areas.  These  advantages  are  due  primarily 
to  the  optical  interconnections,  and  include  the  abilities  to 
implement  large  numbers  of  interconnecting  lines  with  little 
or  no  regard  for  their  length.  This  stems  primarily  from  the 
fact  that  electrons  interact  at  a  distance  whereas  photons  do 
not.  These  advantages  will  be  discussed  in  more  detail 
below. 

In  this  paper  we  will  review  some  of  our  work  in  digital 
optical  computing,  and  discuss  the  possible  optical 
implementation  of  a  cellular  logic  processor  and  some  of  its 
architectural  characteristics.  A  general  review  of  digital  opt- 
ical  computing  is  given  in  Ref.  I. 

n.  OPTICAL  LOGIC  SYSTEM 

An  optical  logic  system  can  be  built  out  of  optical  gates 
and  interconnections.  If  these  interconnections  include  a 
provision  for  feedback,  then  clocks  and  memory  can  con¬ 
structed  in  addition  to  combinatorial  logic.  These  are  the 
minimum  hardware  requirements  to  be  able  to  implement,  in 
principle,  arbitrary  digital  processing  operations  We  have 


demonstrated  an  optical  logic  system  that  includes  these  ele¬ 
ments;  it  allows  large  numbers  of  interconnections  between 
gates.  It  is  described  in  this  section.  We  also  point  out  that 
other  digital  optical  processing  systems  have  been  described 

(11- 

In  our  system  a  2-D  array  of  gates  is  combined  with  an 
optical  holographic  interconnection  system  to  create  a  gen¬ 
eral  optical  sequential  logic  system.  Its  inherent  3-D  struc¬ 
ture  provides  for  a  high  degree  of  interconnection  flexibility. 
The  idea  is  to  take  the  array  of  gate  outputs  and  send  it 
through  a  holographic  system  back  to  the  array  of  gate 
inputs  (Fig.  1).  The  holographic  system  connects  the  output 
of  each  gate  to  inputs  of  other  gates,  effectively  wiring  up  a 
circuit.  For  ease  of  manufacture,  the  holograms  can  be  gen¬ 
erated  by  (electronic)  computer  and  written  out  using  a 
computer  plotting  device.  We  have  experimentally  demon¬ 
strated  the  concept  of  this  system  with  a  16-gate  circuit.  In 
this  section  we  will  discuss  the  gate  array  and  interconnec¬ 
tion  system- 

2-D  arrays  of  optical  gates  demonstrated  to  date  have 
one  drawback  or  another  that  preclude  their  use  in  a  practi¬ 
cal,  competitive  optical  optical  logic  system.  While  current 
devices  can  implement  10s  to  108  gates  in  one  array,  in  most 
cases  the  major  drawback  is  the  extremely  slow  speed  of  the 
devices.  (Typical  response  times  are  >  1  ms.)  Recent  pro¬ 
gress  in  the  area  of  optical  bistability,  however,  provides 
hope  for  fast  optical  logic  systems.  To  date  their  demons¬ 
trations  have  been  primarily  on  individual  (single  gate)  dev¬ 
ices,  but  in  principle  they  can  be  used  for  2-D  arrays  as  well. 
Gate  switching  times  on  the  order  of  ns  have  been  demon¬ 
strated  (2),  and  there  is  potential  for  even  much  Taster  gates 
(3,4|  (although  other  considerations  such  as  power  may  limit 
the  usable  response  time  in  a  system  to  —  ns).  Many  of 
these  devices  are  all  optical  (intrinsic)  in  that  the  signal  is 
not  converted  to  electrons  and  then  back  to  photons  again 
in  order  to  obtain  the  nonlinearity.  This  is  one  of  the  rea¬ 
sons  for  their  speed  advantage.  For  a  review  of  optical  bis¬ 
tability,  the  reader  is  referred  to  (5,6) . 

We  have  previously  described  three  different  optical 
interconnection  systems  for  interconnecting  the  gates  (7). 
All  of  them  use  holograms  in  conjunction  with  free-space 
propagation.  Their  characteristics  differ  and  this  manifests 
itself  in  the  kinds  of  circuits  and  processors  that  can  be 
implemented  most  efficiently  with  each  system  Here  we  dis¬ 
cuss  one  of  the  systems,  which  is  a  hybrid  space- 
variant/space-invariant  system.  This  system  has  the  most 
general  applicability  and  is  the  most  pertinent  to  cellular 
logic  processors.  A  review  of  all  three  systems  can  be  found 
in  Ref  8. 

The  hybrid  interconnection  system  represents  a  basis- 
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Fig.  1.  Functional  block  diagram  of  sequential  optical 
logic  system. 

set  approach  to  interconnections.  The  optical  system  con¬ 
sists  of  two  holograms  and  two  or  more  lenses.  The  idea  is 
to  define  a  finite  number,  M,  of  distinct  interconnection  pat¬ 
terns,  and  then  assemble  tne  circuit  using  only  these  M  pat¬ 
terns.  A  circuit  with  only  one  interconnection  pattern  is 
shown  in  Fig.  2  -  all  gate  outputs  have  exactly  ,  the  same 
interconnection  pattern.  We  generally  expect  that  1  <<M 
<<  N,  where  N  is  the  number  of  gates.  Each  different 
interconnection  pattern  is  essentially  stored  in  only  one 
place  on  the  hologram,  and  many  gate  outputs  can  use  this 
same  interconnection  in  a  non-interfering  manner.  In  addi¬ 
tion,  once  the  necessary  set  of  interconnection  patterns  is 
defined,  it  may  be  possible  to  determine  a  smaller  basis  set 
of  new  interconnection  patterns  that  still  achieves  all  inter¬ 
connections  [9|.  If  so,  M  is  reduced  even  more.  The  number 
of  gates  and  interconnection  patterns  that  are  implemented 
determine  the  complexity  of  the  holograms.  The  hologram 
complexity  that  can  be  achieved  is  limited  by  the  capabili¬ 
ties  of  recording  devices  (e.g.,  computer  plotting  devices). 
Calculations  indicate  that  with  current  plotting  devices,  if 
there  are  M  =  50  interconnection  patterns,  then  N  —  107 
gates  can  be  interconnected  (7).  Increasing  M  will  decrease 
N,  such  that  MN  is  constant.  (We  expect  future  gate  arrays 
to  have  — -  10®,  possibly  up  to  107,  gates.)  Thus  with  this 
approach  the  designer  has  some  minor  limitations  on  the 
interconnections  which  can  be  used,  but  he  has  a  potentially 
large  number  of  gates  at  his  disposal. 

Of  course,  with  a  large  enough  M,  any  circuit  can  be 
implemented.  However,  the  potential  of  this  system  can  be 
exploited  more  fully  by  implementing  circuits  with  a  high 
degree  of  regularity  or  symmetry.  An  example  is  a  processor 
array.  Typically  interconnections  between  processing  ele¬ 
ments  (PEs)  have  a  considerable  amount  of  regularity,  which 
can  reduce  the  size  of  M.  Examples  include  mesh-connected 
arrays,  pyramids,  and  hypercubes.  The  interconnections 
within  each  PE  may  be  completely  arbitrary.  The  fact  that 
many  of  the  PEs  would  typically  be  identical  provides  a 
major  reduction  in  M,  since  each  interconnection  pattern  is 
stored  only  once.  We  also  point  out  that  whether  the  inter¬ 
connections  are  global  or  local  has  essentially  no  effect  on  M 
or  N. 

We  have  experimentally  demonstrated  the  concept  of 
this  optical  logic  system.  An  array  of  NOR  gates  was  used 
with  an  interconnection  system  that  uses  a  single  hologram. 
A  test  circuit  consisting  of  16  gates  was  implemented.  It 
comprises  a  synchronous  master-slave  Sip-flop  and  a  driving 
clock  consisting  of  an  odd  number  of  inverters  in  a  feedback 
loop.  This  experiment  and  its  results  are  described  in  Ref. 
10 


Fig.  2.  Example  of  only  one  interconnection  pattern 
being  used  for  all  gate  outputs  (M=l). 


m.  COMPARISON  OF  DESIGN  CONSTRAINTS 

The  design  of  processors  and  computers  in  any  technol¬ 
ogy  is  constrained  by  the  inherent  characteristics  of  the 
technology.  In  this  section  we  compare  some  of  these  con¬ 
straints  for  electronic  and  optical  systems  and  discuss  how 
they  affect  system  architectures. 

Since  the  development  of  electronic  LSI  and  VLSI,  the 
cost  of  individual  gates  has  constituted  only  a  minor  factor 
in  the  overall  system  cost  function.  The  major  concern  has 
become  internal  and  external  communications  [ll,12|.  The 
internal  wiring  network  affects  the  amount  of  active  chip 
area  available  for  gates;  in  current  systems  it  is  common  for 
more  than  70%  of  the  chip  area  to  be  devoted  to  intercon¬ 
nections  {13}.  Because  of  the  resistance  and  parasitic  capaci¬ 
tance  associated  with  each  on-chip  wire,  the  response  time  of 
a  gate  and  the  propagation  delay  of  a  wire  both  become 
functions  of  the  length  of  the  wire.  Timing  and  clock  skew 
become  problems  because  of  the  differing  wire  lengths 
[14,15).  (Although  the  wire  resistance  can  be  reduced  by 
process  technology,  e.g.,  using  thick  metal  layers,  it  appears 
that  the  wire  lengths  will  still  be  a  limiting  factor  in  the  sys¬ 
tem  timing  considerations  [16].)  Another  restriction  related 
to  interconnections  is  the  limited  number  of  pin-outs  which 
becomes  more  apparent  as  the  number  of  gates  per  chip 
increases.  One  result  of  these  restrictions  is  the  need  to 
minimize  the  number  and  length  of  interconnections. 

For  optical  logic  systems,  the  major  design  considera¬ 
tions  are  admittedly  not  so  well  defined  as  for  VLSI,  but  it 
is  clear  that  the  cost  function  is  much  different.  In  particu¬ 
lar,  most  of  the  communication  costa  affecting  VLSI  design 
are  not  associated  with  optical  systems.  An  optical  system 
can  be  made  so  that  all  interconnections  have  the  same 
length  to  first  order.  (For  example,  this  is  the  case  in  the 
optical  logic  system  described  in  Sec.  II.)  Thus  synchroniza¬ 
tion  problems  due  to  clock  skew  can  be  eliminated,  making 
large  synchronized  systems  more  feasible.  Being  able  to  syn¬ 
chronize  the  circuits  eliminates  the  need  for  handshaking  or 
other  asynchronous  techniques  which  introduce  waiting  time 
for  individual  circuit  elements.  The  other  design  constraints 
associated  with  wire  length,  namely  power  consumption  and 
device  area  utilization,  can  be  avoided  with  optical  systems, 
for  example  by  using  free  space  propagation  in  the  third 
dimension  for  the  interconnections. 

Pin-outs  are  not  a  constraint  in  optical  systems.  Opti¬ 
cal  systems  can  accept  a  large  number  of  parallel  inputs  and 
can  generate  a  large  number  of  parallel  outputs  These  are 
usually  in  the  form  of  2-D  arrays  of  data  or  bits,  e  g  bit 


planes  of  images.  The  careful  partitioning  of  large  systems 
is  then  unnecessary,  and  limitations  on  concurrent  and  pipe¬ 
lined  processing  due  to  large  I/O  requirements  is  relieved 

Of  course,  digital  optical  systems  will  have  constraints 
of  there  own.  It  appears  that  these  will  be  primarily  con¬ 
cerned  with  the  gates.  We  have  seen,  for  the  interconnection 
system  described  in  Sec.  II,  that  there  is  a  preference  for  reg¬ 
ular  or  repeated  interconnections.  The  gates  might  also 
present  some  design  constraints;  this  may  be  in  the  total 
number  used  or  in  the  average  repetition  rate  at  which  they 
are  switched.  In  addition,  other  limitations  may  of  course 
surface  as  the  technology  progresses. 

Because  of  these  differences  in  the  cost  functions  of 
electronic  and  optical  systems,  certain  application  areas  are 
specifically  well-suited  to  one  or  the  other.  VLSI  systems 
are  being  particularly  considered  for  applications  which 
involve  very  regular  structures  and  simple  data  flow  that 
can  be  handled  with  only  local  communications.  An  exam¬ 
ple  is  systolic  array  architectures  (15,17],  which  are  well- 
suited  to  many  vector-matrix  and  matrix-matrix  operations. 
On  the  other  hand,  algorithms  which  inherently  require  glo¬ 
bal  communications  cannot  be  conveniently  handled  by 
VLSI,  but  could,  in  principle,  be  implemented  with  an  opti¬ 
cal  logic  system.  Examples  of  such  communication-limited 
operations  include  some  fast  Fourier  transform  (FFT)  algo¬ 
rithms  which  required  global  communications  due  to  their 
butterfly  structure  [18],  and  some  image  processing  opera¬ 
tions  which  will  be  discussed  in  Sec.  IV. 

IV.  CELLULAR  LOGIC  PROCESSORS 

A  cellular  logic  processor  uses  a  PE,  or  cell,  for  each 
pixel  or  set  of  pixels  of  an  image.  A  full  array  processor  has 
in  principle  a  separate  PE  for  each  pixel  of  an  image,  so  that 
the  number  of  PEs  is  equal  to  or  larger  than  the  size  of  the 
largest  image  it  will  process.  The  number  of  PEs  in  a 
subarray  processor  is  generally  smaller  than  the  image  size. 
The  discussion  here  applies  primarily  to  full  array  proces¬ 
sors,  but  the  possibility  of  processing  images  larger  than  the 
number  of  PEs  will  also  be  considered. 

Possible  Optical  Implementations 

Again  we  use  the  optical  logic  system  of  Fig.  1.  We 
assume  that  one  gate  array  can  provide  ~  10®  to  107  gates. 
We  refer  to  one  gate  array  plus  one  interconnection  unit  as 
a  chip,  and  note  that  multiple  chips  can  be  connected.  We 
should  point  out  that  the  number  of  chips  that  can  actually 
be  used  in  an  overall  system  will  depend  on  improvements  in 
switching  power  of  the  gates  and  advancements  in  other 
areas.  Also,  while  in  priciple  the  system  of  Fig.  1  can  be 
made  small  (~1  cm  on  a  side),  we  will  not  consider  the 
effects  of  physical  size  in  this  paper.  Now,  an  interconnec¬ 
tion  unit  or  units  may  be  used  to  connect  between  a  small 
number  of  chips.  Another  method  of  interconnecting,  when 
the  number  of  chips  is  moderate,  may  be  to  mosaic  multiple 
gate  arrays  into  a  larger  2-D  array,  and  to  use  larger  holo¬ 
grams  to  interconnect  both  within  each  gate  array  and 
between  gate  arrays. 

The  hybrid  or  basis-set  interconnection  system 
described  in  the  previous  section  could  be  used  to  implement 
a  cellular  logic  processor.  Since  all  PEs  are  identical  (except 
perhaps  for  a  small  number  of  additional  PEs  for  other  pur¬ 
poses),  the  number  of  interconnection  patterns  is  relatively 
small.  The  gate  array  will  most  likely  be  the  limiting  factor 


-  block  mode.  Imaging  optics  are  omitted  for  clarity. 
Gate  outputs  enter  from  the  left,  and  the  right  plane  is 
sent  to  the  gate  inputs.  Image  pixel  inputs  are  shown, 
one  to  each  block  (PE).  Gates  numbered  1  are  part  of 
PE  1,  gates  numbered  2  are  part  of  PE  2,  etc. 

in  the  number  of  gates  per  chip.  If  the  PEs  do  not  all  fit  on 
one  chip,  they  may  be  divided  so  that  k  x  k  PEs  are  put  on 
each  chip;  or,  each  PE  may  be  distributed  over  more  than 
one  chip  so  that  each  processor  chip  has  a  portion  of  every 
PE  on  it.  This  is  possible  because  of  the  parallel  I/O  of  each 
chip  -  conceivably  each  chip  could  have  10®  I/O  lines. 

A  variant  of  the  hybrid  interconnection  system  could 
be  used.  In  this  case  volume  (thick)  holograms  are  used, 
which  could  be  optical  copies  of  computer-generated  holo¬ 
grams.  This  provides  an  increase  in  optical  power  efficiency 
as  well  as  in  the  achievable  hologram  complexity.  It  also 
provides  the  possibility  of  copying  multiple  computer- 
generated  holograms  onto  one  volume  hologram,  which 
might  be  used  to  interconnect  a  mosaic  of  2-D  gate  arrays. 
Again  there  are  two  possible  ways  of  organizing  the  loca¬ 
tions  of  the  PEs. 

In  one  case  each  PE  is  physically  localized.  Topologi¬ 
cally  neighboring  PEs  are  placed  in  physical  proximity.  The 
hologram  or  gate  array(s)  are  conceptually  divided  into 
blocks,  one  for  each  PE  (Fig.  3).  All  gates  numbered  1  are 
part  of  PE  1,  gates  numbered  2  are  part  of  PE  2,  etc.  We 
refer  to  this  as  block  mode.  Communication  within  a  PE  i3 
done  by  interconnections  within  each  block,',  and  between 
PEs  is  done  by  the  same  type  of  interconnections,  only  they 
pass  into  neighboring  PEs.  In  this  case,  communication 
within  each  PE  can  be  arbitrary,  between  neighboring  Pits 
is  easy,  but  between  distant  PEs  may  be  more  difficult  (i.e. , 
it  may  increase  the  hologram  complexity).  The  requirements 
on  the  hologram  complexity  in  order  to  implement  a  cellular 
logic  processor  are  approximately  the  same  in  this  case  as 
with  the  hybrid  interconnection  system,  but  since  a  more 
complex  hologram  can  be  made,  more  gates  can  effectively 
be  interconnected. 

The  other  extreme  distributes  each  PE  over  the  gate 
array(s).  In  this  interleaved  mode,  corresponding  gates,  one 
from  each  PE,  are  physically  grouped  together  (Fig.  4).  The 
image  is  input  to  one  group  of  gates,  which  are  the  input 
gates  of  each  PE.  Within-PE  interconnections  are  then  done 
by  connecting  an  entire  group  of  gates  to  another  entire 
group  of  gates.  Between-PE  interconnections  can  be  done 
similarly  except  with  a  slight  misalignment  from  one  group 


Fig.  4.  Optical  interconnection  system  for  cellular  logic 
-  interleaved  mode.  Imaging  optics  are  omitted  for 
clarity.  Image  inputs  are  shown,  one  to  an  input  gate 
of  each  PE.  Gates  numbered  1  are  part  of  PE  1,  etc. 
of  gates  to  the  other.  Again  the  hologram  complexity  dic¬ 
tated  by  this  system  is  approximately  the  same  as  in  the 
hybrid  system. 

Architecture*  and  Characteristic* 

One  problem  with  full  array  cellular  logic  processors 
(CLPs)  is  that  there  may  always  be  some  images  to  process, 
say  of  size  m  x  m,  that  are  larger  than  the  number  of  PEs, 
say  n  2  When  such  an  image  is  processed,  if  it  is  processed 
in  blocks  of  n  x  n  pixels,  one  to  a  PE,  then  the  borders 
between  blocks  can  cause  problems,  especially  in  iterative 
calculations.  Incorrect  data  propagates  inward  from  the 
border  by  an  amount  proportional  to  the  number  of  iterar 
tions  [19].  One  way  of  avoiding  this  is  by  loading  blocks  of 
the  image  with  overlapping  boundaries  into  and  out  of  the 
array  during  every  iteration.  This  significantly  slows  the 
process  down  in  the  case  of  most  electronic  CLPs;  while  the 
calculations  for  one  iteration  may  be  done  in  0(1)  steps, 
loading  data  into  the  PEs  takes  O(n)  steps  if  n  PEs  are 
loaded  in  parallel.  Another  way  of  avoiding  this  boundary 

problem  is  to  store  pixels  in  each  PE  This  adds  to  the 
n 1 

storage  requirements  and  complexity  of  the  PEs;  they  must 
be  capable  of  handling  the  largest  image  that  will  be  pro¬ 
cessed  on  the  machine. 

Another  problem  with  electronic  full-array  CLPs  is 
caused  by  pin-out  limitations  of  LSI  and  VLSI  chips.  If  a 
large  array  is  partitioned  into  chips  with  k  x  k  PEs  on  each 
chip,  then  0(k)  pins  are  needed  for  interconnections  between 
PEs,  if  the  number  of  I/O  lines  to  each  PE  is  a  constant 
independent  of  n.  If  it  instead  grows  with  n,  as  in  the  case 
of  a  hypercube,  for  example,  the  number  of  pins  required 
grows  faster  than  O(t).  Finally,  to  avoid  the  bottleneck  of 
transferring  images  into  and  out  or  the  PEs,  as  described 
above,  k 2  pins  would  be  needed.  We  should  point  out  that 
while  these  problems  appear  to  be  largely  inherent  in  the 
technology,  it  does  not  necessarily  prevent  future  clever 
solutions  from  reducing  their  severity.  At  times  they  can  be 
lived  with  or  partially  avoided  to  a  substantial  degree.  A 
case  in  point  is  the  MPP  [20j. 

An  optical  CLP  has  the  potential  of  bypassing  most  of 
these  problems.  They  all  amount  to  communications  limita¬ 
tions.  either  between  chips  or  between  processors  and 
memory  An  optica!  full-array  CLP  could  have  direct  con¬ 


nections  between  each  bit  la  memory  and  the  PE(s)  that 
correspond  to  that  bit.  A  gate  array  that  could  store  on  the 
order  of  512  x  512  bits  could  have  5I22  or  262,144  lines 
(each  with  a  fanout  of  5  for  the  case  of  a  4-comected  cellu¬ 
lar  array),  each  to  the  appropriate  PEs  Using  multiple 
chips  may  permit  these  numbers  to  be  even  larger. 

In  the  case  of  electronic  sub-array  machines,  similar 
limitations  exist.  Processing  speed  is  limited  primarily  by 
the  data  rate  of  the  bus  between  PEs  and  memory.  In  addi¬ 
tion,  if  pixel  operations  are  done  by  look-up  table,  this  can 
put  an  added  toad  on  the  bus  or  on  the  required  storage 
within  the  PEs.  These  points  are  discussed  in  [19].  In  the 
optics  case,  data  can  again  be  transferred  quickly  and  in 
parallel  between  memory  and  PEs,  so  that  the  data  rate  of 
the  transfer  is  not  a  significant  part  of  the  processing  time. 

Another  possible  advantage  of  optical  CLPs  is  in  the 
PE  to  PE  interconnection  network  topology.  Each  PE  can 
have  a  larger  number  of  input  and  output  tines  (although 
additional  gates  are  needed  to  in  each  PE  to  select  among 
the  lines).  In  addition,  longer  interconnections  between  PEs 
are  feasible.  For  example,  the  hybrid  in  -rconnection  system 
of  Sec.  II  cannot  distinguish  between  global  and  local  inter¬ 
connections.  In  the  other  interconnection  systems  of  this 
section,  global  between-PE  interconnections  do  increase  the 
complexity  of  the  hologram  or  optics  somewhat,  but  may 
still  be  feasible.  Such  between-PE  interconnections  can  sub¬ 
stantially  reduce  the  communication  time  between  PEs.  For 
example,  in  a  simple  nearest-neighbor  mesh-connected  array, 
it  takes  0(n)  time  for  data  to  be  transfered  between  PEs  on 
opposite  sides  of  the  array.  Going  to  a  hypercube,  which 
connects  each  PE  to  PEs  at  distances  of  1,2, 4,.. .,2*  in  each 
dimension,  lowers  this  communication  time  to  O(log2n  );  the 
number  of  lines  connected  to  each  PE  in  this  case  is  also 
O(log2n  )  [21). 

Reference  22  classifies  different  types  of  communica¬ 
tions  between  PEs  and  gives  the  corresponding  communica¬ 
tion  time  on  different  network  topologies.  Some  classes 
depend  on  the  diameter  of  the  graph  representation  of  the 
network.  Examples  include  broadcasting,  where  one  PE 
sends  a  message  to  many  PEs,  and  condensing,  where  many 

PEs  send  messages  to  one  PE,  such  that  the  messages  can  be 
combined  en  route  to  the  destination.  In  both  cases  a  glo¬ 
bally  connected  array  such  as  a  hypercube  can  perform  the 
communication  in  0(logn )  steps,  whereas  a  conventional 
mesh  requires  0(n)  steps.  Another  class  of  communication 
operations  is  one-to-one  tasks,  or  permutations.  Here  the 
topology,  but  not  the  diameter,  determine  the  time,  but 
again  the  hypercube  requires  0(logn )  time,  whereas  a  mesh 
may  require  C^n2)  time. 

Such  between-PE  interconnections  can  substantially 
reduce  the  computation  time  for  some  algorithms  and  pro¬ 
cessing  operations  used  on  images  in  CLPs.  Examples  of 
operations  that  utilize  some  of  the  communication  tasks 
listed  above  are  the  calculation  of  transforms,  moments, 
value  counting  or  bistogramming,  and  region  property  com¬ 
putation.  Full  array  CLPs  can  do  pointwise  and  local  opera¬ 
tions  in  a  small  number  of  steps  that  is  independent  of 
image  size.  Since  the  above  operations  require  time  O(logn ) 
or  greater,  and  loading  or  unloading  of  image  data  into  PEs 
typically  requires  time  that  grows  with  n  in  the  electronics 
case,  technologies  or  architectures  that  reduce  these  times 
could  have  a  significant  impact  on  processing  times  for  many 
image  processing  algorithms. 


CONCLUSIONS 

la  conclusion,  we  have  discussed  possible  optical  sys¬ 
tems  for  the  implementation  of  digital  cellular  logic  proces¬ 
sors.  Current  optical  technology  in  conjunction  with  antici¬ 
pated  progress  in  research  may  make  the  construction  of 
such  a  processor  feasible.  Looking  at  the  underlying  physical 
characteristics  of  electronic  and  optical  digital  processors 
reveals  that  the  two  are  quite  different,  and  this  has  an 
effect  on  the  algorithms  and  architectures  that  can  be  imple¬ 
mented  with  each.  In  the  case  of  cellular  logic  processors  the 
communication  and  interconnection  capabilites  of  optics 
could  provide  for  substantially  reduced  computation  time  for 
image  processing  tasks. 
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2.3  Binary  Image  Algebra 


Binary  image  algebra  (BIA)  forms  the  mathematical  background  for  software  and  hardware 
systems  suitable  for  optical  digital  computing.  Parallel  algorithms  for  optical  cellular  logic  and 
symbolic  substitution  processors  can  be  formalized  as  compact  BIA  expressions.  BIA  also  leads 
to  the  architectural  design  of  digital  optical  cellular  image  processors  (DOCIP)  which  are  well- 
suited  to  executing  the  parallel  algorithms.  The  following  three  papers  summarize  our  recent 
work  in  these  areas. 

The  DOCIP  is  a  2-D,  page  oriented  array  of  individual  processors  located  at  every  pixel  of 
a  large  image.  The  attached  papers  by  K.S.  Huang,  B.K.  Jenkins  and  A. A.  Sawchuk,  “Binary 
Image  Algebra  and  Optical  Cellular  Logic  Processor  Design”,  from  Computer  Vision,  Graphics, 
and  Image  Processing,  Vol.  45,  pp.  295-345,  1989;  “Image  Algebra  Representation  of  Parallel 
Optical  Binary  Arithmetic”,  from  Applied  Optics,  Vol.  28,  No.  6,  pp.  1263-1278,  March  15, 
1989;  “A  Cellular  Hypercube  Architecture  for  Image  Processing”,  from  Applications  of  Digital 
Image  Processing  X,  Proc.  of  SPIE-The  International  Society  for  Optical  Engineering,  Vol.  829, 
San  Diego,  California,  August,  1987,  summarizes  these  concepts  and  their  algebraic  background. 
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Techniques  for  digital  optical  cellular  image  processing  are  presented.  A  binary  image 
algebra  (BIA),  built  from  five  elementary  images  and  three  fundamental  operations,  serves  as 
its  software  and  leads  to  a  formal  parallel  language  approach  to  the  design  of  parallel  binary 
image  processing  algorithms.  Its  applications  and  relationships  with  other  computing  theories 
demonstrate  that  BIA  is  a  powerful  systematic  tool  for  formalizing  and  analyzing  parallel 
algorithms.  Digital  optical  cellular  image  processors  (DOCIPs),  based  on  cellular  automata 
and  cellular  logic  architectures,  serve  as  its  hardware  and  implement  parallel  binary  image 
processing  tasks  efficiently.  An  algebraic  structure  provides  a  link  between  the  algorithms  of 
BIA  and  architectures  of  DOCIP  Optical  computing  suggests  an  efficient  and  high-speed 
implementation  of  the  DOCIP  architectures  because  of  its  inherent  parallelism  and  3D  global 
free  interconnection  capabilities.  Finally,  the  instruction  set  and  the  programming  of  the 
DOC'Ps  are  illustrated  C1  IW»  Academic  Press,  Inc 


1  INTRODUCTION 

In  this  paper  we  combine  studies  of  architectures,  algorithms,  mathematical 
structures,  and  optics  to  show  that:  (1)  an  image  algebra  extending  from  mathemati¬ 
cal  morphology  [2]-{5J  can  lead  to  a  formal  parallel  language  approach  to  the  design 
of  image  processing  algorithms;  (2)  cellular  automata  are  appropriate  models  for 
parallel  image  processing  machines  [6,  7];  (3)  an  algebraic  structure  serves  as  a 
framework  for  both  algorithms  and  architectures  of  parallel  image  processing;  and 
(4)  the  parallel  processing  and  global  interconnection  advantages  of  optical  comput¬ 
ing  may  be  useful  in  efficiently  implementing  image  algebra  with  cellular  logic 
architectures. 

The  purpose  of  the  image  algebra  approach  in  this  paper  is  for  the  development 
of  a  programming  language  for  a  specific  parallel  architecture,  namely  a  digital 
optical  cellular  image  processor  (DOCIP).  The  binary  image  algebra  (BIA)  de¬ 
scribed  here  is  based  on  a  set  of  three  specific  fundamental  operations.  These 
fundamental  operations  are  the  key  operations  in  the  instruction  set  of  the  DOCIP 
machine.  The  BIA  provides  a  decomposition  of  general  operations,  including 
low-level  image  processing  operations,  into  the  three  fundamental  operations  of  the 
instruction  set.  This  decomposition  is  inherently  parallel  and  provides  a  direct 
mapping  to  the  machine  architecture. 
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In  this  section,  we  first  review  previous  work  on  image  algebra,  cellular  automata, 
and  cellular  logic  architectures,  then  we  define  the  algebraic  structure  and  outline 
the  detailed  discussion  that  follows. 

Previous  Work  on  Image  Algebra 

During  the  past  few  years,  numerous  papers  have  used  an  algebraic  approach  to 
aid  in  image  processing  [2-5,  8-10J.  Among  them,  morphological  image  algebra  has 
the  closest  relation  to  binary  image  algebra  (B1A).  Many  papers  describe  either 
specific  theoretical  aspects  of  mathematical  morphology  or  application-specific 
morphological  algorithms  [11 -18).  The  applications  of  mathematical  morphology 
have  been  fruitful.  In  this  paper  we  adapt  it  to  provide  the  following  features: 

1.  A  simplified  mathematical  structure.  Mathematical  morphology  comprises 
two  branches,  integral  geometry  and  geometrical  probability,  plus  a  few  collateral 
ancestors  (harmonic  analysis,  stochastic  processes,  algebraic  topology)  [2).  The 
mathematical  details  and  formal  proofs  in  morphology  are  often  intricate  and 
involve  advanced  set  theoretic  and  topological  concepts  which  are  not  always 
necessary  for  engineering  applications. 

2.  A  complete  algebraic  theory.  Mathematical  morphology  defines  some  alge¬ 
braic  operators  and  utilizes  some  algebra.  With  our  adaptation,  we  would  like  to 
answer  the  following  questions: 

•  What  is  the  algebraic  definition  of  this  mathematical  morphology? 

•  How  powerful  is  this  mathematical  morphology? 

•  What  is  the  definition  of  a  transformation?  Morphological  transformations 
are  constrained  by  four  principles  [2];  here  we  introduce  a  complete  definition  of 
image  transformations. 

3.  Clarification  of  its  relationship  to  other  areas.  We  define  its  relationship  to 
linear  system  theory,  image  processing,  and  common  computing  techniques  includ¬ 
ing  boolean  logic,  cellular  logic,  and  algebraic  structures. 

Here  we  develop  a  simple  unified  complete  parallel  binary  image  processing 
theory  based  on  an  algebraic  structure — binary  image  algebra  (BIA).  In  B1A, 
parallel  binary  image  processing  algorithms  (including  parallel  numerical  computa¬ 
tions)  can  be  written  as  compact  algebraic  expressions  where  an  algebraic  symbol 
represents  an  image  (not  a  pixel)  or  an  image  operation  (not  a  pixel-wise  operation). 
A  complete  algebraic  system  comprises  three  fundamental  operations  and  five 
elementary  images  which  can  be  combined  to  generate  any  image  in  the  three 
fundamental  operations  for  forming  any  image  transformation.  (In  fact,  one  can 
define  four  elementary  images  and  two  fundamental  operations  that  are  sufficient: 
however,  in  this  paper  we  will  not  consider  them  since  they  are  more  difficult  to 
use.) 

There  are  other  image  algebras,  each  with  its  own  characteristics  [8.  9).  Because  of 
our  intended  application  to  a  highly  parallel  computing  machine  with  simple 
processing  elements  and  a  reduced  instruction  set,  we  utilize  a  BIA  with  only  three 
fundamental  operations  that  can  implement  any  binary  image  transformation.  For 
example,  the  counting  function,  which  gives  the  number  of  pixels  having  a  certain 
level,  is  considered  a  mapping  from  a  picture  type  of  operand  to  a  number  type  of 
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Fig.  1  A  sequential  process  of  cellular  logic  operations  (CLO)  The  value  A "(i.  / )  is  determined  by 
the  corresponding  X(i,  /)  in  the  original  image  along  with  the  values  of  its  neighbors 


operand  in  references  [8]  and  [9];  in  BIA  numbers  are  also  represented  as  images 
(19],  BIA  suggests  several  simple  but  fast  parallel  image  algorithms  and  a  parallel 
image  processing  architecture  with  a  very  low  cell  complexity. 

Previous  Work  on  Cellular  Logic  Architectures 

To  match  BIA  parallel  algorithms  by  cellular  logic  architectures  in  a  transparent 
way,  we  characterize  a  cellular  automaton  by  algebraic  structure  as  BIA  does.  The 
cellular  logic  computer  was  first  inspired  by  the  writings  of  von  Neumann  [20,  21) 
on  cellular  automata.  A  sequential  process  of  cellular  logic  operations  is  described 
in  Fig.  1.  Some  review  of  cellular  image  processors  can  be  found  in  Refs.  (21-25). 
Many  cellular  computers  have  been  constructed  previously  for  implementing  cellu¬ 
lar  logic  operations,  and  some  ideas  for  extending  the  nearest-neighbor  connected 
cellular  logic  computers  for  improving  speed  and  flexibility  have  been  proposed  [24], 
These  architectures  include:  (1)  the  cellular  string  (Fig.  2(a));  (2)  the  cellular  array 
(Fig.  2(b));  and  (3)  the  cellular  hypercube  (Fig.  2(c));  and  the  cellular  pyramid  (Fig. 
2(d)).  These  three  architectures  share  a  common  feature  in  the  simplicity  and 
regularity  of  interconnecting  simple  processing  elements  and  represent  an  intercon¬ 
nection  topology  in  ID,  2D,  and  3D,  respectively.  The  3D  case  is  difficult  to 
implement  on  a  planar  VLSI  chip  [24,  26,  27),  but  may  be  realizable  by  a  digital 
optical  system  [28,  29],  Two  promising  architectures  based  on  digital  optical  cellular 
image  processors  (DOCIPs),  DOCIP-array  and  DOCIP-hypercube,  are  presented 
below  as  a  means  of  implementing  BIA  effectively. 

Definition  of  Algebraic  Structure 

An  algebraic  structure  (or  algebra)  [30  -32)  is  a  pair  (or  system)  A  -  (S,  F), 
where 


•  S  is  a  set,  and 
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•  /•'  is  a  family  of  operations  which  are  functions, 

/:  Sk  -»  S, 

and  A  is  a  finite  nonnegative  integer. 

Remark.  For  any  finite  nonnegative  integer  A,  we  define  a  A-ary  operation  on  S 
as  an  operation  which  is  a  function  /:  Sk  -»  5.  Thus,  a  unary  (or  1-ary)  operation 
on  S  is  simply  a  function  on  S  to  S.  A  binary  (or  2-ary)  operation  on  S  is  a 
function  on  S2  to  S.  For  completeness,  we  define  a  nullary  (or  0-ary)  operation  on 
S'  to  be  a  particular  element  of  S. 


-  Connections  in  the  4 -connected  cellular  array 

Connections  in  the  8-con neded  cellular  array 


Fig.  2.  (a)  A  cellular  string  It  requires  only  a  ID  interconnection  geometry  Each  cell  only  connects 
with  its  two  nearest  cells,  (b)  A  cellular  array.  It  requires  a  2D  interconnection  geometry  Each  cell 
connects  with  its  4-  or  K-nearest  cells,  (c)  A  1 -dimensional  cellular  hypercube  (24 j.  Each  cell  connects 
with  cells  at  distances  1,2,4,  8, . . .  ,2*  from  it.  Here,  only  the  connections  with  distances  1.  2.  and  4  are 
shown  (d)  A  2 -dimensional  cellular  pyramid.  It  consists  of  stages  of  arrays  with  connections  between  two 
adjacent  stages  and  is  most  efficiently  implemented  with  a  3D  interconnection  geometry 
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Therefore,  (he  problem  to  be  solved  is  essentially  to  find  an  “appropriate” 
algebraic  structure  ( S ,  F)  for  parallel  binary  image  processing,  i.e.,  to  search  for  S 
and  F,  and  its  “efficient”  hardware  implementation. 

Outline 

Section  2  contains  the  framework  of  BIA:  Subsection  2.1  gives  the  basic  defini¬ 
tions;  Subsection  2.2  presents  two  fundamental  principles  which  prove  the  com¬ 
pleteness  of  BIA. 

Section  3  describes  some  applications  of  BIA:  Subsection  3.1  reviews  basic 
properties  of  images  and  image  transformations  and  derives  from  them  some 
standard  image  operations;  Subsection  3.2  gives  some  examples  of  special  cases; 
Subsection  3.3  gives  some  useful  theorems  and  examples  for  low  level  vision 
operations,  including  morphological  filtering,  shape  recognition,  “salt”  and  “pepper” 
noise  removal,  size,  and  location  verification. 

Section  4  discusses  the  relationship  of  BIA  and  other  computing  theories: 
Subsection  4.1  describes  the  relationship  with  boolean  logic;  Subsection  4.2  de¬ 
scribes  the  relationship  with  symbolic  substitution  and  cellular  logic;  Subsection  4.3 
describes  the  relationship  with  linear  shift  invariant  system  theory,  convolution,  and 
correlation;  subsection  4.4  describes  some  standard  algebraic  structures  supported 
by  BIA. 

Section  5  presents  the  implementation  of  BIA  on  digital  optical  cellular  image 
processors  (DOCIPs):  Subsection  5.1  gives  the  algebraic  description  of  the  DOCIPs 
which  have  the  same  algebraic  structure  as  BIA;  Subsection  5.2  gives  the  general 
description  of  the  DOCIPs. 

Finally,  the  programming  of  the  DOCIPs  is  illustrated  in  Section  6. 

2.  BINARY  IMAGE  ALGEBRA  (BIA)  FUNDAMENTALS 

The  overall  philosophy  of  BIA  is: 

•  An  image,  but  not  a  pixel,  is  an  object.  For  parallel  languages  and  machines 
for  image  processing,  images  can  be  considered  as  primitive  variables  for  simplifying 
the  design. 

•  Complex  image  processing  operations  can  be  reduced  to  simple  instructions. 
Although  image  processing  operations  appear  complex,  the  fundamental  interac¬ 
tions  and  the  elementary  components  in  a  system  are  very  simple. 

Thus,  BIA  begins  by: 

1.  Defining  the  universal  image  as  the  working  space  for  images  and  their 
image  transformations. 

2.  Defining  elementary  images  which  can  be  combined  to  generate  any  image. 

3.  Defining  fundamental  operations  which  can  be  cascaded  to  form  complex 
operations. 

4.  Defining  an  image  processing/analysis  algorithm  design  as  the  choice  of 
“good”  (or  “appropriate”)  reference  images  and  transformations. 

A  reference  image  can  be  any  image  and  is  a  generalization  of  structuring  elements 
in  mathematical  morphology  [2],  Reference  images  contain  some  predefined  image 
property  (or  information);  image  transformations  (or  operations)  are  used  for 
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measuring  the  image  property  from  an  input  image.  Image  description,  image 
information  extraction,  or  image  property  measurement  is  done  by  using  reference 
images  to  model  or  transform  the  original  image  to  a  final  state  which  reveals  the 
desired  information  or  is  used  to  detect  the  desired  properties  easily. 

Here  we  give  the  algebraic  structure  of  BIA  first,  and  then  we  provide  definitions 
and  present  two  fundamental  principles  which  allow  us  to  generate  any  reference 
image  and  to  implement  any  image  transformation.  Ideally,  BIA  may  be  further 
generalized  to  GIA  (general  image  algebra)  which  deals  with  grey-level  and  com¬ 
plex-valued  images. 


2.1.  Definitions 

Definition  of  Binary  Image  Algebra  (BIA).  Binary  image  algebra  is 
an  algebra  with  an  image  space  5,  which  is  the  power  set  of  a  predefined  uni¬ 
versal  image  P(W),  and  a  family  F  of  operations  including  three  fundamental 
operations  (ffi,  U,  '),  which  are  non  0-ary  operations,  and  five  elementary  images 
(I.  A.  A  ’,  B ,  B~').  which  are  0-ary  operations.  Symbolically, 

BIA  =  (P{W)\  ® ,  U ,  , 1 .  A,  A  \B.B  *), 

i.e..  S  =  P(W)  and  F  -  (©,  U,  ",  /.  A,  A  ‘  \  B,  B  ').  The  image  space  S  and  the 
family  F  of  operations  will  be  derived  in  the  following. 

Basic  Definitions 

In  general,  a  binary  digital  image  is  defined  as  a  function  /  that  maps  each 
spatially  sampled  grid  point  (x,  y)  of  the  picture  on  an  orthogonal  coordinate 
system  onto  the  set  composed  of  two  elements:  1  (i.e.,  white,  foreground  point  or 
image  point)  and  0  (i.e.,  black  or  background  point).  However,  it  will  be  more 
convenient  for  our  algebra,  if  we  use  a  set  of  the  coordinates  of  image  points  (l's)  to 
specify  an  image.  In  this  paper,  an  image  is  treated  as  the  set  of  coordinates  of 
image  points  (i.e.,  foreground  points  or  pixels  that  have  value  1).  We  begin  the 
description  of  BIA  by  defining  our  artificial  universe: 

Definition  2.1  ( universal  image).  The  universal  image  is  the  set  W  =  {(.v.  y)| 

x  e  Z„,  y  e  Z„),  where  Z„=  {0,  ±1,  ±2 . ±n}  and  n  is  a  positive  integer 

(Fig.  3). 

Remark,  “e  ”  means  “belongs  to.”  Notice  that  given  n,  the  universal  image 
defines  the  domain  of  our  images.  In  fact,  for  an  image  with  size  larger  than 
(2 n  +  1)  X  (In  +  1)  (the  size  of  the  universal  image),  we  need  to  increase  the  size  of 
the  universal  image  or  decompose  the  tested  image  into  subimages  whose  sizes  are 
smaller  than  the  size  of  the  universal  image.  For  the  reason  of  simple  practice,  we 
only  consider  the  square  tessellation  of  images.  To  deal  with  nonsquare  (eg. 
hexagonal)  tessellations,  we  can  simply  replace  the  universal  image  to  be  the  set  of 
grid  points  corresponding  to  the  new  tessellation  pattern. 

Definition  2.2  ( image  space).  The  image  space  is  the  power  set  (the  set  of  all 
subsets)  of  the  universal  image,  i.e.,  S  =  P(IV). 

Definition  2.3  (image).  A  set  X  is  an  image  if  and  only  if  X  is  an  element  of 
the  image  space  5,  i.e..  X  is  a  subimage  (subset)  of  the  universal  image  W. 
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Fig.  3  The  universal  image  W.  It  has  (2 n  +  1)  X  (2 n  +  1)  image  points  anti  n  is  a  positive  integer 


Symbolically. 


A'  is  an  image  «-»  A'  e  S  «-»  X  c  IV. 


Remark.  "  C  means  “is  included  in."  There  exist  2<2"*  11  different 

images.  Three  terms  related  to  images  are  defined: 

1.  Size  (or  area)  of  an  image  X.  denoted  as  #(  A'),  is  the  cardinality  (i.e..  the 
number  of  elements)  of  the  image  X. 

2.  Foreground  of  an  image  AT,  simply  denoted  as  A',  is  referred  to  those  pixels 
with  value  1. 

3.  Background  of  an  image  X,  denoted  as  the  complement  X  (Definition  2.6). 
is  referred  to  those  pixels  with  value  0. 

Once  we  know  the  foreground  of  an  image,  the  background  of  this  image  is  well 
defined  (since  the  universal  image  is  given  first).  Thus,  the  foreground  is  sufficient  to 
specify  an  image. 

Definition  2.4  (image  point  ( foreground  point)).  A  point  (jt,  y)  is  an  image 
point  of  an  image  X  if  and  only  if  (jc,  y)  is  an  element  of  the  set  .V. 

Remark.  The  largest  image  is  the  universal  image  W  and  consists  of  (In  +  1 )  x 
(2 n  +  1)  image  points,  i.e.,  #(W)  =  (2 n  +  1)  x  (2 n  +  1);  the  smallest  image  is  the 
null  image  </>  (defined  as  the  complement  <j>  =  IV)  and  has  no  image  poinls.  i.e.. 
*(<*>)  =  0. 


Definition  2.5  (image  transformation).  A  transformation  7  is  an  image  trans¬ 
formation  if  and  only  if  T  is  a  function  mapping  from  the  image  space  .V  to  the 
image  space  .S’. 


Remark.  There  exist 


~){2n  +  I )  *  (2n  ♦  I )  v  (  21'"*  1 


image  transformations. 
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Fig.  4.  An  example  of  fundamental  operations:  complement  ",  union  U .  and  dilation  © . 

Definition  2.6  ( three  fundamental  operations ).  There  are  three  fundamental 
operations  (Fig.  4): 

1.  Complement  of  an  image  X : 

X=  {(x,>)|(i,v)  e  IT  A  (x,  y)  <2  A-}. 

2.  Union  of  two  images  X  and  R: 

XUR=  {(x,y)\(x,y)e  XV  (x,y)e  R}. 

3.  Dilation  of  two  images  X  and  R: 

I  {(*1  +  Xz.At  +yt)  e  Ti)  e  X,(x2,y2)  e  R}. 

AT®/?  =  <  (3f=£0)A(/?:A0), 

\0,  otherwise. 


Remark.  “A”  means  “and,”  and  “V”  means  “or.”  Note  that  X  usually 
represents  an  input  or  data  image  and  R  is  a  reference  image.  The  consideration  of 
null  image  in  the  dilation  operation  is  missing  in  mathematical  morphology  (where 
the  dilation  is  defined  as  the  union  of  all  translations  of  X  by  all  image  points  in 
R);  with  this  generalization  we  have  a  complete  theory  which  is  not  found  in  other 
image  algebras  because  there  is  less  demonstration  of  their  capabilities  for  imple¬ 
menting  any  image  transformation.  We  can  also  define  other  image  operations  as 
fundamental  operations  instead  of  these  three  operations.  The  reason  for  choosing 
these  three  operations  is  because  of  their  simplicity,  and  resulting  simple  software 
design  and  hardware  implementation.  As  shown  later,  these  three  operations  may  be 
implemented  by  a  2D  optical  gate  array  with  3D  interconnections. 


Definition  2.7  (elementary  images).  These  elementary  images  are  constant 
images,  i.e.,  0-ary  operations.  Each  elementary  image  has  only  one  image  point. 
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There  are  five  elementary  images: 

1 .  /  =  {(0,0)}— consisting  of  an  image  point  at  the  origin 

2.  A  =  {1,0)}— consisting  of  an  image  point  right  of  the  origin 

3  a  1  =  {(-  1,0)}— consisting  of  an  image  point  left  of  the  origin 

4.  B  =  {(0, 1)} — consisting  of  an  image  point  above  the  origin 

5.  B~l  —  {(0,  —  1)} — consisting  of  an  image  point  below  the  origin. 

Remark.  In  fact,  these  five  elementary  images  could  be  reduced  to  four  elemen¬ 
tary  images,  because  /  =  A0  =  A  BA1  =B°  =  B®B  l. 

Definition  2.8  (reflected  reference  image).  Given  a  reference  image  R  which  is 
a  predefined  image  for  containing  some  desired  image  property  or  image  informa¬ 
tion,  its  reflected  image  is  defined  as 

R  =  {(-a:,  -y)l(x.y)  GR}. 

Remark.  In  many  useful  cases  the  reference  image  R  is  symmetric,  then  R  =  R 
2.2.  Two  Fundamental  Principles 

Two  fundamental  principles  basically  define  the  binary  image  algebra  (B1A). 
Before  stating  these  two  principles,  we  give  some  preliminary  results. 

Lemma  2.1 


(X  ©  R)  U  (A"®  R)  U  l 


ifX  =  R 
otherwise 


V  X,  R  e  P(  W ).  where  /  =  {(0,0)}  is  an  elementary  image,  R  is  the  reflected 
reference  image  of  R.  and  “V”  means  “  for  all." 

Proof.  Appendix  A. 

Remark.  This  lemma  says  that  if  the  image  A"  matches  the  image  R,  then  the 
origin  (central  pixel)  of  the  above  output  image  has  value  “1.”  otherwise  it  is  always 
"0.” 


Theorem  2.1.  Any  image  transformation  T\  P(W)  -*  P(W)  can  he  expressed  as 


k  _ _ _ _ _ 

T(  X)  =  (J  {(X®  R,)  U  (X®  R,)  U  / 
1-1 


Q, 


}■ 


where  k  <  *(  P(  W7 )),  R ,  and  Q,  are  the  reference  images  used  to  form  any  desired 
image  transformation,  and 
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Proof.  Appendix  B. 

Theorem  2.2.  Any  intake  can  be  represented  as 


X  =  1J  A‘B\ 

(I.  J)  G  X 


where  A‘BJ  =  A'  ®  Bf 

A'  &  A  0  A  ffi  ■  •  ■  9  A  =  {(/,  0)}  if  i  >  0. 

I 

A'  =  A'1  ®  A~l  ®  ■■■  ®/T  1  =  {(TO)}  ifi  <  0. 

—  I 

and  A.  B.  A  \  B  1  are  the  elementary  images  defined  in  Definition  2.7. 

Proof.  Appendix  C. 

Principle  1  (fundamental  principle  of  image  transformations).  Any  image 
transformation  T  can  be  implemented  by  using  appropriate  reference  images  R  and  the 
three  fundamental  operations'.  (1)  complement  X  of  an  image  X,  (2)  union  U  of  two 
images ,  (3)  dilation  ®  of  two  images. 

Proof.  It  follows  from  Theorem  2.1. 

In  order  to  use  Principle  1  efficiently  in  practice,  we  invoke  Principle  2  for  the 
generation  of  reference  images. 

Principle  2  (fundamental  principle  of  reference  images).  Any  reference  image  R 
can  be  generated  from  elementary  images  (/,  A,  A  ' B.  B~  ‘)  by  using  the  three 
fundamental  operations. 

Proof.  It  follows  from  Theorem  2.2. 

Therefore,  by  the  above  principles,  we  can  represent  BIA  as: 

BIA  =  (P(W);  ®,  U,  ",  /,  A,  A~l,  B,  B1). 

3.  DEVELOPMENT  OF  BINARY  IMAGE  ALGEBRA  (BIA) 

BIA  can  have  many  applications  in  character  recognition,  industrial  inspection, 
medical  image  processing,  and  scientific  computation.  In  this  section  we  first  review 
the  basic  properties  of  images  and  image  transformations,  define  11  standard 
operations,  and  give  some  special  cases  of  dilation  [2-5,  33-36).  Then  we  summarize 
four  theorems  and  some  examples  for  binary  image  processing. 

This  section  is  primarily  a  survey  of  binary  image  processing  algorithms  with 
implementation  using  BIA  fundamental  operations.  These  fundamental  operations 
are  so  chosen  because  they  form  an  efficient  basis  for  the  instruction  set  of  an 
optically  based  cellular  image  processor.  This  survey  serves  as  a  description  of  a 
parallel  language  for  controlling  the  processor  and  how  it  is  compiled  into  low  level 
instructions.  The  use  of  BIA  for  parallel  numerical  computation  is  described  in  [19]. 
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^  the  pi x u  at  coo’dinate  (x.y)  the  pixel  at  coordinate  (x.y'i 

[ZED 

(a)  tho  4-noighborhood  of  (x.y)  (b)  the  8-neighborhood  of  (x,y) 

Fig.  5  The  4-ncighborhood  and  K-ncighborhood  of  an  image  poim  (  a.  i  ) 


/.  Basic  Properties  of  Images  and  Image  Transformations 

Definition  3.1  (connectivity  in  images).  1.  4-neighbor  and  8-neighbor:  An 
image  point  (a,  y)  in  an  image  X  can  have  two  types  of  neighbors: 

(a)  An  image  point  (i,  j)  is  a  4-neighbor  of  (a,  y)  <-»  (/,  j)  e  {(  a  +  1.  y). 
(  v,  v  ±  1)}. 

Remark,  {(a,  y),  ( x  ±  1,  y),  (a,  y  ±  1)}  is  called  the  4-neighborhood 
of  (a.  y )  and  s  {(0,0),  (0,  +  1), (+1,0)}  =  /  U  A  U  A  U  B  U  H  ' 
(Fig  5(a)). 

(h)  An  image  point  (i,  j)  is  a  8-neighbor  of  (a,  v)  <-*  (i.  j)  e  {(a  +  1.  v). 
(  a.  y  ±  1 ),  ( x  ±  1 ,  y  ±  1 ) } . 

Remark,  {(a,  y),  (a  ±  1,  y),  (a,  y  +  1),  (  a  ±  1,  y  ±1)}  is  called  the 
8-neighborhood  of  (a,  y )  and  s  {(0,0).  (0,  +1),  (±1.0),  ( ±  1 .  ±1)) 
(Fig.  5(b)). 

2.  4-connected  and  8-connected: 

(a)  Two  image  points  (a,  y)  and  (/,  j)  of  an  image  X  are  4-connected  «-> 

there  exists  a  sequence  of  image  points  (x,  y)  =  (x0,  y0).  (a,,  y, ) . 

(xm.  yj  =  (/,  j),  where  (a*.  yk)  is  a  4-neighbor  of  (a*  ,.  vk  ,)  and 
(xk,  yk)  e  X,  1  <;  k  <  m. 

(b)  Two  image  points  (a,  y)  and  (/,  j)  of  an  image  X  are  8-connected  *-♦ 

there  exists  a  sequence  of  image  points  (a.  y)  =  (x0,  y0),  (a,,  y, ) . 

(Am.  yj  =  (»',  j).  where  (x*,  yj  is  an  8-neighbor  of  (xk  ,.  yk  ,)  and 
(a*,  >’*)  e  T,  1  <  k  ^  m. 

Remark  1.  “4-connected  in  X"  and  “8-connected  in  A'"  arc  equivalence 
relations  (reflexive,  symmetric,  and  transitive). 

Remark  2.  For  any  image  point  (a,  y)  in  a  nonnull  image  A',  the  set  of  (t,  j) 
such  that  (a,  y)  and  (/,  j)  are  4-connected  (or  8-connected)  is  called  a  4-connected 
(or  8-connected)  component  of  A\  A  4-connected  (or  8-connected)  component  of  A' 
is  just  an  equivalence  class  in  X  under  the  equivalence  relation  —  “4-connected  (or 
8-connected)  in  A'.”  Thus,  a  collection  of  4-connected  (or  8-connected)  components 
of  X  forms  a  partition  of  X ,  i.e.,  the  set  of  all  4-connecled  (or  8-connected) 
components  {  A-,  },p  ,  (where  I  is  the  index  set  of  connected  components)  is  a  family 
of  nonnull  subimages  of  X  and  has  the  following  properties: 

(a)  A',  +  0  for  all  i  e  I. 

(b)  Xt  C\  Xj  =  0  for  all  /  A  j,  i,  j  e  /.  ( X,  O  A'/  =  X,  U  X/  as  defined  in  Defi¬ 
nition  3.3.) 

(C)  Af  =  U,  ,:  /  Xr 
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Image  X  (a)  A  4-conn®cted  component  ot  X  (b)  An  8-connected  component  ol  X 

Fig.  6.  The  4-connectcd  component  and  K -con netted  component  of  an  image 

Figure  6(a)  shows  a  4-connected  component  in  an  image  X  and  Fig.  6(b)  shows  an 
8-connected  component  in  X. 

Remark  3.  If  an  image  X  has  /  4-connected  (or  8  connected)  components,  there 
are  /  distinct  equivalence  classes  in  X.  Each  equivalence  class  Xl  can  be  represented 
by  an  image  point  in  A",.  Thus,  we  may  use  l  distinct  image  points  which  belong  to  / 
different  4-connected  (or  8-connected)  components  to  represent  the  classes  of  the 
image  .V. 

Remark  4.  In  dealing  with  connectedness  in  both  X  and  X ,  to  avoid  the 
“connectivity  paradox”  [33],  it  is  preferable  to  use  opposite  types  of  connectedness 
for  X  and  X ,  i.e.,  if  we  use  “4-connected”  for  X,  then  we  use  “8-connected”  for  X 
and  vice  versa. 

Remark  5.  If  any  image  X  is  surrounded  by  a  border  of  0's.  the  component  of  X 
consisting  of  the  points  connected  to  (any  one  of)  these  0’s  is  called  the  outside  of  X 
(Fig.  7(a)).  If  X  has  any  other  components,  they  are  called  holes  in  X  (Fig.  7(b)). 

For  more  detailed  discussion  of  geometric  properties  of  images,  the  reader  is 
referred  to  (33-351.  For  equivalence  relations,  equivalence  classes,  and  partitions, 
please  refer  to  (30-32). 

Definition  3.2  ( basic  properties  of  image  transformations).  The  key  properties 
of  image  transformations  are  the  following  ten  basic  properties: 

1.  Increasing.  An  image  transformation  T(X)  is  increasing 

»(.Vc  Y  -»  T(X)  c:  T(  Y ))  for  all  X.Y  e  P(W). 

2.  Decreasing.  An  image  transformation  T{  X )  is  decreasing 

*->(  X  c  Y  ->  T(Y)  c  T(  X))  for  all  X.  Y  <='  P(W). 


Fig.  7.  The  outside*  and  holes  of  an  image 
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3.  l:x tensive.  An  image  transformation  T(X)  is  extensive 

«,Vc  T(X)  for  all  X  e  P{W). 

4.  Antiextensive.  An  image  transformation  T(X)  is  antiextensive 

«->  T(X)  c  X  for  all  X  e  P{W). 

5.  Idempotent.  An  image  transformation  T(  X)  is  idempotent 

<-*  T(T(  X))  =  T(  X)  for  all  X  e  P(W). 

6.  Shift  invariant.  An  image  transformation  T(X)  is  shift  invariant 

*-*  T(  X  ©  P )  =  T(  X )  ©  P  for  all  X,  P  e  P(  W ) 

and  P  is  a  point  image  which  consists  of  one  and  only  one  image  point. 

If  an  image  transformation  is  not  shift  invariant,  then  it  is  shift  variant: 

T(  A'  ®  P)  #  T(  X)  ffi  P  (in  general). 

7.  Homotopic.  An  image  transformation  T(  X /  is  homotopic 

«-*  there  exists  a  one-to-one  and  onto  correspondence  between  the  connected 
components  of  .V  and  those  of  T(X),  for  all  X  e  The  same  is  then  true  for 

the  holes. 

8.  Commutative.  A  binary  image  operation  •  is  commutative 

*-*  X  ■  R  =  R  ■  X  for  all  X,  R  e  P(  W). 

9.  Associative.  A  binary  image  operation  •  is  associative 

*-■  (  X  ■  R)  ■  Q  =  X  ■  (R  •  Q)  for  all  X ,  R ,  Q  &  P(H'). 

10.  Distributive.  A  binary  image  operation  -  is  distributive  over  a  binary  image 
operation  + 

*-*  X  ■(  R  +  Q)  =  (  X  ■  R)  +  (X  ■  Q)  for  all  X.  R,Q  e  P(H  ). 

Diunition  3.3  ( standard  operations).  Most  standard  operations  can  be  derived 
from  the  three  fundamental  operations;  eleven  common  ones  follow: 

1.  Difference  of  ,V  by  R  (Fig.  8(a)): 

X/R  =  {(.V.  v)  c  X  |  (a,  y)  <t  R]  =  X  HR  =  XU~R  . 

Remark.  X  =  W/ X.  where  W  is  the  universal  image.  The  difference  is  an 
obvious  approach  to  detect  defects  in  the  foreground  of  a  tested  image. 
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Difference 


Intersection 


Erosion 


Symmetric 

Difference 


Opening 


CLOSING 


Fig.  8.  Eleven  standard  derived  image  operations:  (a)  difference;  (b)  intersection;  (c)  erosion; 
(d)  symmetric  difference;  (e)  opening;  (f)  dosing;  (g)  hit  or  miss  transform  (template  matching); 
(h)  thinning;  (i)  thickening;  (j)  a  sequential  thinning  (used  for  homotopic  skeletonization)  [36];  (k)  a 
conditional  dilation 


2.  Intersection  of  two  images  X  and  R  (Fig.  8(b)): 


X  n  R  =  {(.x:,  _v)f(-v.  >')  e  X  A  (x,  v)  e  R }  =  X  U  R  . 


Remark.  XUR  =  XnR.ltXr\R±  0,  then  we  say  that  an  image  a  hits  (or 
is  joint  with)  an  image  R.  If  X  n  R  =  0,  then  we  say  that  an  Image  X  misses  (or  is 
disjoint  with)  an  image  R. 


Figure  8- Continued 


3.  Erosion  of  an  image  A"  by  a  reference  image  R  or  foreground  template 
matching  of  X  by  R  (Fig.  8(c)): 


X  Q  R  -  X  ®  R  . 


Remark.  X  ©  R  =  X  0  R  ,  and  R  =  R  when  R  is  symmetric.  The  erosion  of  an 
image  X  by  a  reference  image  R  can  be  thought  of  as  the  complement  of  the 
dilation  of  the  background  by  the  reflection  of  the  reference  image  R.  In  general,  the 
erosion  of  a  nonnull  image  X  by  a  nonnull  reference  image  R  can  be  used  to 
decrease  the  size  of  regions,  increase  the  size  of  holes,  eliminate  regions,  and  break 
bridges  in  X;  on  the  contrary,  the  dilation  of  a  nonnull  image  A"  by  a  nonnull 
reference  image  R  can  increase  the  size  of  regions,  decrease  or  fill  in  holes  and 
cavities,  and  bridge  gaps  in  X.  Furthermore,  the  erosion  can  be  interpreted  as  a 
foreground  template  matching  where  the  foreground  points  of  A '  Q  R  indicates  the 
occurrences  of  the  foreground  template  R  in  A"  (in  this  purpose,  the  size  of  R 
usually  is  much  smaller  than  the  size  of  A). 

4.  Symmetric  difference  of  two  images  (mod  2  image  addition  or  subtraction) 
(Fig-  8(d)): 


X  ■  R  =  (  X/R)  U  (  R/X)  =  A  UR  U  R  U  X  . 

Remark.  The  symmetric  difference  is  a  commutative  operation,  and  its  inverse 
operation  can  be  defined  as  itself.  In  Section  4  we  show  that  this  operation  is  the 
parallel  form  of  boolean  EXCLUS1VE-OR.  It  is  an  obvious  approach  to  detect 
defects  (including  the  foreground  or  background  defects)  of  a  tested  image. 

5.  Opening  of  an  image  A  by  a  reference  image  R  (Fig.  8(e)): 

X°R=(XeR)®R  =  X®R  ©  R. 

Remark.  The  opening  operation  is  an  erosion  followed  by  a  dilaton  with  the 
same  reference  image  R.  In  general,  the  opening  A °  R  with  a  nonnull  reference 
image  R  reduces  the  size  of  regions  and  eliminates  some  image  points  by  removing 
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all  features  in  X  which  cannot  contain  the  reference  image  R. 

6.  Closing  of  an  image  A"  by  a  reference  image  R  (Fig.  8(f)): 

X  ■  R  =  (X  ®  R)  e  R  =  ( X  G)  R)  e  R  . 

Remark.  The  closing  operation  is  a  dilation  followed  by  an  erosion  with  the 
same  reference  image  R.  In  general,  the  closing  X  ■  R  with  a  nonnull  reference 
image  R  increases  the  size  of  regions  and  eliminates  some  background  points  by 
filling  in  all  background  areas  that  cannot  contain  the  reference  image  R,  such  as 
holes  and  concavities  in  the  image  X. 

7.  Hit  or  miss  transform  ®  of  an  image  X  by  an  image  pair  R  =  (/?,.  R2)  or 
template  matching  of  X  by  R  (Fig.  8(g)): 

x  ®  R  =  (X  e  R^  n  (x  e  r2)  =  (x  ®  Rt)  u  (x  ®  R2). 

Remark.  The  hit  or  miss  transform  of  an  image  A"  by  a  reference  image  pair 
R  =  (/?,,  R2)  is  used  to  match  the  shape  (or  template)  defined  by  the  reference 
image  pair  R,  where  Rl  defines  the  foreground  of  the  shape  and  R 2  defines  the 
background  of  the  shape.  The  key  conditions  are  that  the  foreground  A”  must  match 
Rt  (i.e.,  X  e  Rt),  while  simultaneously  the  background  X  matches  R2  (i.e., 
X  ©  R2).  In  order  to  better  define  the  hit  or  miss  transform  and  its  relationship  with 
conventional  boolean  logic  operations,  we  start  from  a  pixel-wise  boolean  compari¬ 
son  to  derive  the  hit  or  miss  transform  in  shape  recognition  (Theorem  3.2).  Note  the 
similarity  of  the  symmetric  difference  and  the  hit  or  miss  transform. 

8.  Thinning  ©  an  image  X  by  an  image  pair  R  =  (/?,,  R2)  (Fig.  8(h)): 

X@R  =  X/(X®R)  =  XU  {X  ©  R,)  U  (2f  ©  R2). 

Remark.  The  thinning  operation  is  antiextensive  and  decreases  the  size  by 
removing  the  central  points  of  the  regions  which  match  the  reference  image  pair 

R  =  {RvR2). 

9.  Thickening  G  an  image  X  by  an  image  pair  R  =  (R j,  R2)  (Fig.  8(i)): 

XOR  =  X'j(X®R)  =  Xu(x®Rl)u(X ©  R2). 

Remark.  The  thickening  operation  is  extensive  and  increases  the  size  by  filling 
the  image  points  where  the  regions  match  the  reference  image  pair  R  =  (/?,,  R .  ). 

10.  Sequential  operations  (e.g.,  sequential  dilation,  sequential  erosion,  sequen¬ 
tial  thinning):  If  an  image  operation  •  is  successively  performed  with  each  reference 

image  (or  image  pairs)  in  a  sequence  (Rg)  =  (Ra,  Rh . R ;),  then  we  define  a 

sequential  image  operation 

X-  (R„)  =  (  ■■■  ((X  ■  Ra)  ■  Rh)  ■■■  R J. 
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Two  examples  are: 

(a)  Sequential  thinning  of  an  image  A'  by  a  sequence  of  image  pairs 

(Re)^(Ra.Rh . R:): 

X@(Re)  =  (  •••((A ®Rj®Rh)  --  ®R:). 

Remark.  The  sequential  thinning  is  powerful  in  many  applications,  such  as 
constructing  a  digital  homolopic  skeleton  of  an  image  X.  Skeletonization  of  an 
image  is  an  operation  that  transforms  the  image  to  a  simplified  image,  called  a 
skeleton,  which  emphasizes  its  connectivity.  However,  a  homotopic  skeleton  cannot 
be  obtained  by  digitizing  an  analog  skeletonization  algorithm;  instead,  a  sequential 
thinning  with  a  sequence  of  reference  image  pairs  should  be  used.  Several  different 
algorithms  employing  different  reference  image  pairs  (called  masks)  have  been 
proposed  by  several  authors  (6,  36].  Figure  8(j)  shows  an  example  of  the  skeletoniza¬ 
tion  by  a  sequential  thinning  with  a  sequence  of  eight  reference  image  pairs 
proposed  by  Levialdi  et  al.  [36], 

(b)  Sequential  dilation  of  an  image  X  and  a  sequence  of  reference  images 

(Re)  =  (Ra,Rh . 

*©(/?,)  =  (-•  ((*©  Ra)®Rh)  -  ®R:). 


Remark.  Since  the  dilation  is  commutative  and  associative,  in  practice  the 
dilation  X  ffi  R  with  a  large  reference  image  R  is  usually  implemented  as  a 
sequential  dilation  with  a  sequence  of  small  reference  images.  For  example,  if 
R  =  Ei  ®  E2  ®  *  •  •  ©£*,  then 


X  ©£  =  (•••  ((A'©  Ei)  ©  £2)  ©  ■  ■  •  ©£j; 
and  if  £  =  £,  =  £2  =  •  •  •  =  Ek.  then 

R  =  £*  s  £©£©■-■  ©  E . 

k 

11.  Conditional  operations  (e.g.,  conditional  dilation,  conditional  erosion,  con¬ 
ditional  thinning):  An  image  operation  •  between  an  image  X  and  a  reference  image 
(or  image  pairs)  R  performed  within  a  limiting  set  Y  is  called  a  conditional 
operation  and  is  denoted 

X  ■  R\Y  =  (X-  R)  n  Y  =  Flu  F. 


Remark.  Figure  8(k)  gives  an  example  of  the  conditional  dilation. 

3.2.  Examples  of  Special  Cases:  Translation  (Shifting).  Expansion. 
Shrinking,  and  Projection 

Translation  (shifting),  expansion,  shrinking,  and  projection  in  a  direction  can  be 
achieved  by  the  dilation  (or  erosion)  in  a  direct  way. 
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1.  Shifting  an  image  A  from  coordinate  (t,  y)  to  coordinate  (.v  +  i,  y  +  j)  is 
done  by 

X ©  {(/,;)}  =  *e  {(-/,-»}. 

Remark.  A  point  image  {(/,  _/)}  corresponds  to  a  discrete  delta  function  at 
v  -  j).  Thus,  an  image  function  X(x,  y)  (which  corresponds  to  the  image 
,V)  convolved  with  the  delta  function  6(.r  -  i,  y  -  j)  or  correlated  with  fi(*  +  i, 
v +  j )  is  the  same  as  X  ffi  {(/,  j)}  =  X  e  {(-/,  -/)}. 

2.  Adding  a  new  8-connected  or  4-connected  boundary  to  an  image  A'  (i.e., 
expansion)  is  done  by 

X  ©  A4  or  X  ffi 

where  Ay  s  .4  U  ,4  '  U  B  U  B  1  and  A,*  =  i  A  BT 

3.  Removing  the  8-connected  or  4-connected  boundary  of  an  image  A  (i.e.. 
shrinking)  is  done  by 

A  e  A4  =  A  ©  A4  or  A  e  A,  =  A  ®  A* , 

where  \4  =  1  u  4  u  4  1  U  H  u  II  1  and  A„  =  Uj  j  A‘BJ. 

4.  Projecting  an  image  A  to  distance  k  in  a  direction  0.  i.e..  producing  a 
shadow  of  A.  where  the  furthest  image  point  in  the  shadow  in  the  direction  6  is  at 
distance  k  from  the  furthest  image  point  in  A  in  the  direction  6:  this  can  be 
achieved  by 

A  ffi  6*. 

where  0  can  be  any  one  of  the  following: 

•  East:  E  =  l  u  A.  Ek  =  Uf.n  * 

•  South:  S  ~  1  U  B  '.  Sk  -  Uf_„  B  ‘ 

•  West:  W  =  /  u  A  '.  Wk  =  U‘.0  .4  ' 

•  North:  A  =  /  U  B.  A*  =  Uf_„  B‘ 

•  Southeast:  Sf:  =  /  U  AB  \  Sj ‘  =  Uf.y  A'B  ' 

•  Southwest:  Su  =  /  U  A  'B  '.  S„  =  Uf_0  A  ‘B 

•  Northwest:  A*.  =  /  U  4  ’B,  A*.  =  U,\.0  A  'B' 

•  Northeast:  Nt  =  /  U  /IB.  A/  =  U*_„  j4'B‘ 

•  Horizontal:  //  -  U).  ,  A‘.  Hk  -  Uf,  *  A‘ 

•  Vertical:  L  =  U).  ,  B'.  Vk  =  Uf.  *  B‘ 

•  Left-diagonal:  /.„  =  Uj.  ,-4  'B',  Ek„  =  U*.  kA  'B' 

•  Right-diagonal:  K„  -  Ul_  ,  A‘B‘.  Rk„  =  U*.  ,  A‘B‘ 


3.3.  Theorems  for  Imw  Ix'vcl  Vision 

Here  we  summarize  four  theorems  and  some  examples  for  binary  image  process¬ 
ing  applications.  Theorem  3.1  gives  basic  properties  of  the  BIA  fundamental 
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TABLli  1(a) 

Basic  Properties  of  Three  Fundamental  Operations  and  of  Three  Derived  Operations 
(Alternative  Fundamental  Operations) 


's''s*Ope  ration: 

Properties'-''-**. 

Complement 

TV 

Union 

A  u  H 

Dilation 

X  *  R 

Difference 

A/K 

Interseclon 

V  n  R 

Erosion 

Xi*R 

Increasing 

No 

Yes 

Yes 

Yes 

Yes 

Yes 

Decreasing 

Yes 

No 

No 

No 

Np 

No 

Intensive 

NO 

Yes 

Yes 

(if  H  1  I) 

No 

No 

No 

Aniiexlensive 

No 

No 

No 

Yes 

Yes 

Yes 

(if  R  D  /) 

Idempotent 

No 

Yes 

No 

Yes 

Yes 

No 

Shill  invariant 

NO 

No 

Yes 

NO 

No 

Yes 

Homotopic 

No 

No 

No 

NO 

No 

No 

Commutative 

No 

Yes 

Yes 

No 

Yes 

No 

Associative 

No 

Yes 

Yes 

NO 

Yes 

No 

Distributive 

(with  some  oper 

No 

Yes 

(with  n) 

Yes 

(with  U) 

No 

Yes 

(with  U,  A) 

No 

TABLE  1(b) 

Basic  Properties  of  Some  Standard  Derived  Operations 

Symmetric 

Diflerence 

V  A  K 

Opening 
Vo  H 

1 

Thinning 

X@R 

Thickening 

AO* 

Homotopic 

st"'tetont7ation 

*©(«#) 

Increasing 

No 

Yes 

Yes 

Yes 

Yes 

No 

Pecmas-ng 

No 

No 

No 

No 

No 

No 

f.  xtensive 

No 

No 

Yes 

No 

Yes 

No 

Aniiexlensive 

No 

Yes 

No 

Yes 

No 

Yes 

Idempotent 

NO 

Yes 

Yes 

No 

fjo 

Yes 

Shift  invariant 

NO 

Yes 

Yes 

Yes 

Yes 

Yes 

>  tomotopic 

NO 

No 

No 

No 

No 

Yes 

Commutative 

Yes 

NO 

No 

No 

No 

No 

Associative 

Yes 

No 

No 

No 

No 

No 

Distributive 
(with  soma  ope' 

No 

No 

No 

No 

No 

No 
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operations  and  standard  operations.  We  then  describe  the  implementation  of 
morphological  filters,  shape  recognition  algorithms,  “salt"  and  “pepper”  noise 
removal,  and  size  and  location  verifications.  Those  more  obvious  proofs  are  omitted 
for  brevity. 

Theorem  3.1  (properties  of  image  operations).  The  BIA  fundamental  operations 
and  standard  operations  have  the  properties  shown  in  Table  1(a)  and  Table  1(b). 

Proof.  Appendix  D  gives  some  of  their  mathematical  expressions  which  follow 
from  the  definitions. 

Examples  of  morphological  filters.  Many  image  transformations  are  interpreted 
as  morphological  filtering  [2]  or  cellular  filtering  [6].  Some  major  morphological 
filters  are  listed  in  the  following: 

1  One  kind  of  morphological  low  pass  filter  (Fig.  ‘1(a)):  to  remove  high 
frequencies  in  the  foreground  of  an  image  X  can  be  achieved  by  opening,  i.e.. 


X°  R  =  (X  G  R)  ®  R  =  X  ®  R  ®  R. 

2.  A  second  kind  of  morphological  low  pass  filter  (Fig.  ‘1(b)):  to  remove  high 
frequencies  in  the  background  of  an  image  X  can  be  achieved  by  closing,  i.c.. 


XR  =  (X®R)QR  =  (X®R)®R. 

3  A  morphological  high  pass  filter  (as  shown  in  Fig.  9(c))  which  removes  low 
trequencies  in  the  foreground  of  an  image  A’  can  be  achieved  by  the  difference  of 
X  and  its  opening,  i.c.. 


X/(Xo  R)  =  X/((  X  0  R)  ffi  R)  =  X/(X  ©  R  ®  R)  =  X  V  (X  ®  R  ©  R). 

4.  A  morphological  band  pass  filter  (as  shown  in  Fig.  9(d))  which  removes  low 
frequencies  and  high  frequencies  in  the  foreground  of  an  image  X  can  be  achieved 
by  the  difference  of  its  opening  with  a  smaller  reference  image  R  and  its  opening 
with  a  larger  reference  image  Q,  where  R  C  Q.  i.e., 

(X°R)/(X°Q)  =  (( A"  G  R)  ©  /?)/((.¥©  Q)  ©  0) 

=  ((*  ©  R  )  ©  «)/(( A  ©  Q  )  ©  (?) 


=  ( .V  ©  R  ©  R  )  U  ( A'  ©  Q  ©  Q )  . 


Theorem  3.2  (shape  recognition  (template  matching)).  1.  The  locations  of  a 
shape,  that  is  defined  by  a  nonnull  reference  image  R  and  a  nonnull  reference  image 


Fig.  9  (a)  One  kind  of  morphological  low  pass  filter  (opening);  (b)  a  second  kind  of  morphological 

low  pas  filter  (closing);  (c)  a  morphological  high  pass  filter;  (d)  a  morphological  band  pass  filter. 


( called  mask)  M  (Fig.  10(a)),  with  R  C  M  c  W  (W  is  the  universal  image),  can  be 
detected  by 

(X  e  R)  n  (X  e  (M/R))  =  (X®  R)  U  (x®  ( M/R )) 

=  (*©£) 

Equivalently,  setting  R,  =  R,  R2  -  M/R,  and  redefining  a  nonnull  reference  image 
pair  R  =  ( Rt,  R2)  (  Fig.  10(b))  yields  the  hit  or  miss  transform  of  X  by  R: 

x  ®  R  =  ( X  e  Rx )  n  ( .v  e  R: )  =  ( x  ©  )  u  ( x  ©  R2 ). 


2.  The  locations  of  a  shape,  that  is  defined  by  a  family  of  nonnull  reference  image 
pairs  {  R(0))  with  #e0(0  is  the  index  set  of  the  family  of  nonnull  reference  image 
pairs  and  R(0)  =  ( R/8),  R2(0)),  can  be  detected  by  the  union  of  the  hit  or  miss 
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Fig  10  (a)  One  kind  of  shape  recognition  R  represents  the  shape  to  he  identified  and  must  he 

entirely  and  exclusively  in  the  mask  defined  by  M.  (b)  Hit  or  miss  transform  which  recognizes  locations 
of  foreground  points  given  bv  R,  in  conjunction  with  background  points  given  by  R2. 

transform  of  X  by  R(0): 

u  X®R($)=  U  (*©  /?,(«))  n  (*e  /?,(«)) 

#60  #60 

=  U  (*®  *,(<?))  u  (at®  r2(o)). 

#60 

Proof .  Appendix  E. 

Theorf.m  3.3  (“salt"  and  “pepper”  noise  removal).  1.  "Salt"  noise  removal 
(solated  image  point  removal)  ( Fig.  1 1(a)):  to  remove  an  image  point  if  its  4- connected 
or  8- connected  neighbors  are  background  points  (O'.v )  can  be  a 

X  O  =  X  U  X  ®  M\ 

nr 

X  O  Q*  -  X  u  X  ©A/*. 


"here  Qt  =  ( yW4,  / ),  QH  =  (A/,,  / ),  =  A  U  A  1  U  B  (J  B  1  =  Nt/I  and  M 8  = 

K/l- 
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( X  u  X  ©  A/<)  u  ( A'  u  (X  ©  Af«)) 


Fig.  11.  (a)  “Salt”  noise  removal  (b)  “Pepper”  noise  removal,  (c)  “Salt”  and  “pepper”  noise 
removal 


2.  “ Pepper  ”  noise  removal  (interior  fill)  ( Fig.  11(b)):  to  create  an  image  point  at 
a  coordinate  if  its  4- connected  or  ^-connected  neighbors  are  image  points  ( 1  's )  can  be 
achieved  by 


X©  R<  =  X  U  X  <B  M, 
or 

X  ©  Rx  =  X  U  X  ®  A/„, 


where  /?„  =  ( /,  Mt),  /?„  =  (/.  Afs). 


DINARY  IMAGE  ALGEBRA 


3.  "Salt  and  pepper"  noise  removal  (Fig.  11(c)):  to  remove  noise  points,  that  arc 
completely  surrounded  with  4- connected  neighbors  or  8- connected  neighbors  of  the 
opposite  value,  can  be  achieved  by 

(  X  O  Q4)/(  x®r4)  =  (x  U  T®~M4)  U  (AU  (A©  M4)) 

or 

(  x  0  Qf)/(  x  ®  /?s)  =  (  X  U  X  ©  Mh)  U  (a  U  (  A  ffi  M„)). 


Proof.  Appendix  F. 

Remark.  This  theorem  demonstrates  the  fact  that  many  higher  level  operations 
(e.g..  involving  thinning  and  thickening)  can  be  efficiently  implemented  by  the  three 
fundamental  operations.  Using  the  same  design  methodology  as  the  “salt  and 
pepper”  noise  removal,  we  can  design  many  similar  algorithms,  such  as  spur 
removal,  bridge  break,  and  edge  detection  (perimeter).  For  example,  the  detection  of 
the  4-connected  or  8-connected  edge  of  an  image  A'  (Fig.  12)  can  be  achieved  by 


X/(  X  0  N„)  =  X  U  (  A  ©  N„) 
or 

x/(xe  n4)  =  J u  (a©  n4). 

Theorem  3.4  (size  and  location  verification).  The  locations  in  an  image  X  of  the 
regions  including  the  reference  image  R  and  included  in  the  reference  image  Q.  where 
R  C  Q,  can  be  detected  by 


S{(  X  Q  R )/((  X  0  Q)  ffi  Q))  =  s((A’©  R)  U  (X  ®  Q  ©(?)), 

where  S(  )  means  the  homotopic  skeletonization.  (An  example  is  given  in  Fig.  13.) 
Proof.  Appendix  G. 

The  above  theorems  serve  as  the  typical  rules  for  morphological  image  processing. 
In  fact,  there  are  many  ways  to  analyze  the  shapes  and  sizes  of  an  image  by  using 
only  the  three  fundamental  operations.  As  another  example:  comparing  an  image  A 
with  its  convex  hull  C(  A)  [34|  is  a  useful  technique  to  analyze  shape.  If  there  is  only 
one  object  or  objects  separated  by  distances  greater  than  their  own  diameters  in  the 
image  A,  then  its  convex  hull  is  the  intersection  of  projections  (Fig.  14(a)), 

n(A©c-),k). 

1 


"here  i  =  1.2, 3,4,  are  //,  V,  R /.„  (defined  in  Definition  3.4).  and  k  should 
be  greater  than  the  longest  radius  of  objects  in  A'.  Then  the  dilfcrence  of  the  convex 
hull  and  the  image  C(  X)/X  indicates  how  many  concavities  the  image  A  has  and 
what  their  individual  shapes  and  sizes  are.  Figure  14(b)  illustrates  an  example. 
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Fig.  14  (a)  An  example  of  ihc  convex  hull  of  an  image  X  (implemcnlcd  by  ihc  intersection  of 

projections),  (b)  The  difference  of  C(  X)  by  A". 


4.  RELATIONSHIP  TO  OTHER  COMPUTING  THEORIES 
4.1.  Relationship  to  Boolean  Logic 

BIA  can  implement  any  boolean  logic  operation  on  binary  images.  It  is  also 
obvious  that  BIA  fundamental  operations  can  be  implemented  by  a  boolean  logic 
gate  array  with  interconnections.  The  following  straightforward  correspondence  can 
be  drawn  between  the  BIA  operations  and  boolean  logic  operations: 


BIA  operations 

1.  Complement 

2.  Union 

3.  Dilation 

4.  Intersection 

5.  Erosion 

6.  Symmetric  difference 


Boolean  logic  operations 

NOT 

OR 

Multiple-inpul  OR 
AND 

Multiple-input  AND 
EXCLUSIVE-OR 


Note  that  the  inputs  of  OR  and  AND  (corresponding  lo  union  and  intersection) 
come  from  two  different  images.  The  multiple  inputs  of  OR  and  AND  (correspond¬ 
ing  to  dilation  and  union)  come  from  the  same  image  while  the  other  operand  image 
R  only  determines  the  number  and  location  of  input  pixel  values.  A  complete  logical 
set  is  able  to  implement  any  boolean  logic  function,  it  consists  of  at  least  one  of  the 
following  sets:  NOT  and  OR;  NOT  and  AND;  NAND;  NOR.  In  BIA,  in  order  to 
implement  any  image  transformation,  we  need  a  complete  system  of  pixelwise  logic 
operations  and  we  also  need  a  translational  type  of  operation  (such  as,  translation, 
dilation,  erosion,  convolution,  and  correlation)  to  allow  the  global  information 
extraction  in  an  image  or  the  information  exchange  between  pixels  of  the  same 
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images.  In  order  to  have  a  2D  compact  parallel  form  of  image  processing  algorithms 
whose  variables  are  whole  images,  we  define  the  parallel  form  of  those  correspond¬ 
ing  boolean  logic  operations  as  BIA  operations.  In  fact,  there  are  two  boolean 
algebras  (P(  W)\  U,  n,  ",  0 ,  W)  and  ( P((V);  a,  O,  “,  0,  IV),  supported  by  BIA 
also  (Subsection  4.4).  We  can  define  several  possible  sets  of  fundamental  operations 
for  implementing  any  image  transformation,  such  as  a  parallel  form  of  NOR  (or 
NAND  or  (NOT  and  OR)  or  (NOT  and  AND))  and  a  translational-type  operation 
(e.g.,  translation,  dilation,  erosion,  convolution,  and  correlation).  The  reasons  that 
we  choose  complement,  union,  and  dilation  as  the  three  fundamental  operations 
are: 


•  Nice  mathematical  properties.  The  dilation  is  commutative,  associative,  and 
distributive  over  the  union;  but  the  erosion  has  no  such  properties. 

•  Simple  hardware  implementation.  These  three  operations  are  easily  imple¬ 
mented  by  the  2D  gate  array  and  3D  interconnection  technique. 

•  Simple  software  design.  These  three  operations  are  inherently  parallel  and 
frequently  used  operations.  Algorithms  can  be  written  as  compact  formulas  which 
easily  become  very  efficient  fast  parallel  algorithms  by  simply  applying  the  funda¬ 
mental  operations  and  removing  the  data  depende.  ci. 

Comparing  BIA  with  the  conventional  boolean  expressions  for  logic  functions, 
the  major  advantages  of  BIA  are  summarized  in  the  following: 

•  BIA  operations  are  inherently  parallel,  but  boolean  logic  operations  are 
serial. 

•  BIA  operations  include  parallel  information  transferring  capabilities  which 
are  missing  in  boolean  logic  operations. 

•  Algorithms  in  BIA  are  written  as  compact  algebraic  formulas  whose  vari¬ 
ables  are  whole  images,  while  a  typical  image  processing  algorithm  is  very  difficult 
to  write  in  a  compact  precise  boolean  logic  expression. 

•  BIA  has  pictorial  physical  meaning,  while  boolean  expressions  provide  little 
physical  feeling  for  parallel  image  processing  algorithms. 

4.2.  Relationship  to  Symbolic  Substitution  and  Cellular  Logic 

Symbolic  substitution  is  a  means  of  performing  parallel  digital  computations  and 
can  be  used  to  implement  boolean  logic,  binary  arithmetic,  cellular  logic,  and  Turing 
machines  [37,  38].  It  involves  two  steps:  (1)  recognizing  all  the  locations  of  a  certain 
spatial  pattern  within  the  2D  input  data  and  (2)  substituting  a  new  replacement 
pattern  wherever  the  search  pattern  was  recognized.  BIA  can  be  used  to  realize  a 
symbolic  substitution  rule. 


(X  ®  R)  ©  Q  =  (X  ®  R,)  U  (X  ffi  R2)  ®  Q, 

where  X  is  the  2D  input  data,  R  =  ( R)(  R2)  is  the  reference  image  pair  correspond¬ 
ing  to  the  search  pattern  (Rt  and  R2  define  the  foreground  and  the  background  of 
the  search  pattern,  respectively),  R  defines  a  reflected  reference  image  given  by 
R  =  ((-.x,  — y)|(x,  y)  e  R },  and  Q  is  the  reference  image  corresponding  to  the 
replacement  pattern.  Thus,  symbolic  substitution  rules  are  particular  BIA  image 
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transformations  having  the  above  form;  and  BIA  represents  a  general  complete 
systematic  mathematical  tool  for  formalizing  the  symbolic  substitution  algorithms. 

Cellular  logic  architectures  have  been  briefly  reviewed  in  Section  1.  A  cellular 
logic  operation  transforms  an  array  of  data  into  a  new  array  of  data  where  each 
element  in  the  new  array  has  a  value  determined  only  by  the  corresponding  element 
in  the  original  array  along  with  the  values  of  its  neighbors  (Fig.  1).  In  BIA,  an  image 
transformation  can  be  written  as  a  polynomial  of  reference  images  (Theorem  2.1), 
where  the  reference  images  can  have  arbitrary  large  size.  In  terms  of  cellular  logic, 
the  reference  image  essentially  defines  the  neighborhood  of  a  cell  where  the 
neighborhood  can  be  very  large  and  not  just  nearest  4-  or  8-neighborhood  as  in 
conventional  cellular  logic.  Thus,  cellular  logic  operations  are  also  particular  cases 
of  image  transformations  with  small  local  reference  images,  and  BIA  also  serves  as  a 
systematic  mathematical  tool  for  formalizing  cellular  logic. 

Because  of  existing  hardware  interconnection  limitations,  it  is  difficult  and  costly 
to  implement  an  image  transformation  with  a  large  reference  image  in  one  clock 
cycle.  In  addtion,  the  conventional  nearest-neighbor  connected  cellular  arrays  have 
poor  communication  capabilities.  To  improve  this,  we  develop  the  DOCIP-hypercube 
architecture  in  Section  5,  which  combines  features  of  conventional  nearest-neighbor 
connected  cellular  logic  architectures  and  conventional  hypercube  architectures  for 
implementing  BIA  effectively. 

In  summary,  BIA  provides  a  systematic  mathematical  formalism  for  both  sym¬ 
bolic  substitution  and  cellular  logic.  The  applications  of  symbolic  substitution  and 
cellular  logic  can  be  accomplished  by  BIA;  on  the  other  hand,  generalized  cellular 
logic  architectures  are  good  candidates  for  implementing  BIA. 

4.3.  Relationship  to  Linear  Shift  Invariant  Systems, 

Convolution,  and  Correlation 

It  is  well  known  that  the  theory  of  linear  shift-invariant  (LSI)  systems  plays  a  key 
role  tn  conventional  signal  (including  image)  and  system  analysis  [39,  40].  It  is  very 
natural  that  we  like  to  ask  what  the  relation  between  BIA  and  LSI  system  theory  is. 
A  system  is  defined  as  a  transformation  or  mapping  from  a  set  of  input  functions 
into  a  set  of  output  functions,  and  a  2-dimensional  discrete  LSI  system  is  defined  as 
a  system  which  obeys  two  properties: 

•  Linearity.  T[ax(i,  j )  +  bz(i ,  j)\  =  aT[x(i,  >)J  +  bT[z(i.  j)\  for  arbitrary 
constants  a  and  b\ 

•  Shift-invariance.  Y(i,  j)  =  r[x(/,  >)]  -»  y(i  -  k,  /  —  / )  =  T[x(i  -  k, 

I  -  /)]. 

A  linear  system  can  be  completely  characterized  by  its  unit-impulse  response 
r(t.  /;  k,  I)  =  T[8{i  -  k,  j  -  /)].  In  an  LSI  system  the  unit-impulse  response  is 
simply  r(i,  j\  k.  I)  =  r(i  -  k,  j  -  /),  and  the  output  of  an  LSI  system  with  input 
vl'-  /)  and  unit-impulse  response  r(i,  j)  is  the  convolution  of  v(».  j)  and  r(i.  j), 
denoted  bv 


X 

•'■(>.  J  )*r(t,  j)  =  £  x(k,  l)r(i  —  k.  j  -  /). 

k  ,  /=*  x: 
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Now,  let  us  consider  oniy  binary  images.  In  terms  of  the  set  notation,  an  image 
X  =  {(/,  ;)|x(i,  j)  =  1}  corresponds  to  function  x(i,  j).  If  we  assume  r(i,  j )  =  1 
at  and  only  at  n  points  which  correspond  to  an  image  R  with  n  image  points,  then 
the  convolution  of  jc(i,  j)  and  r(i,  j)  with  a  threshold  I  =  0  is 

x  *  R\,-o  =  (('.  y)|  Ex(*,/)r(f  -  k,  j  -  I)  >  0 

U./ 

(i  +  k,j  +  l)  Lx{kJ)r(i,j)>  o] 
k.i  I 

=  {(/  +  k,  j  +  l)\x(kj)r(i,  j)  >  0} 

=  {(i  +  k.j  +  l)\(i.j)eX.(k,l)GR) 

=  X  ®  R, 

where  the  output  of  the  threshold  is  defined  as  1  if  x(i,  j)*  r(i ,  j)  >  0,  and  is  0 
otherwise;  and  the  universal  image,  as  before,  contains  all  image  points  (/,  j),  (k,  l) 
and  (i  +  k,  j  +  /).  This  means  that  the  dilation  X  ©  R  is  the  same  as  adding  a 
threshold  /  =  0  to  the  convolution  sum.  The  reference  image  plays  a  role  similar  to 
that  of  the  unit  impulse  response  in  the  binary  image  system.  Similarly  the  erosion 
X  ©  R  is  the  same  as  the  convolution  x(i,  j)*  r(  —  i,  -j)  followed  by  the  threshold 
t  =  n  —  1. 

Correlators  have  been  used  in  pattern  recognition  for  a  long  time  [41].  Correlation 
is  strongly  related  to  convolution:  convolution  involves  folding,  shifting,  anc 
summing;  correlation  involves  shifting  and  summing  without  folding.  Therefore, 

xe  R  =  X*Rl,_0  =  XoR\,.q 

where  *  means  convolution,  o  means  correlation,  and  R  means  the  reflects 
image  of  R. 

Furthermore,  although  the  three  fundamental  operations  of  BIA  are  nonlineai 
with  appropriate  number  representations  they  are  able  to  implement  parallel  numet 
ical  and  linear  operations  too.  Also,  BIA  can  implement  both  shift  invariant  an 
shift  variant  image  transformations. 

4.4.  Some  Standard  Algebraic  Structures 

Some  algebraic  structures  supported  by  BIA  are: 

1 .  ( P(  W );  ® )  is  a  semigroup. 

2.  (  P(  W );  ® ,  / )  is  a  monoid. 

3.  ( P{  W );  a,  0 ,  a)  is  an  abelian  group. 

4.  ( P(  W );  U ,  n ,  ,  0 ,  W )  and  (  P(  W );  a.  Pi  ,  ,  0 ,  H ' )  are  Boolean  algebra 

5.  (  R(  W );  c  )  is  a  poset  (partially  ordered  set). 

6.  (  /’(  W );  U .  n .  c  )  is  a  complete  lattice. 
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Proof.  (1)  A  semigroup  is  a  set  with  an  associative  binary  operation  [30-32).  By 
Theorem  3.1,  the  dilation  ©  is  associative  for  all  images  in  P(W). 

(2)  A  monoid  is  a  semigroup  with  an  identity  [30-32],  By  Appendix  D,  the 
dilation  has  an  identity  /  =  {(0,0)}.  Note  that  (P(W)\  ©)  is  neither  a  semigroup 
nor  a  monoid. 

(3)  A  group  is  a  monoid  in  which  every  element  has  an  inverse.  An  abelian 
group  is  a  group  in  which  the  operation  is  commutative  [30 — 32).  By  the  definition 
of  symmetric  difference  (mod  2  image  addition),  it  can  be  easily  verified  that  its 
identity  is  0  and  its  inverse  operation  (mod  2  image  subtraction)  is  itself. 

(4)  A  boolean  algebra  is  a  set  with  operations  V,  a,  “,  0,  and  1  satisfying: 
1.  a  V  b  =  b  V  a,  a  A  b  =  b  A  a  (commutativity);  2.  a  V  (b  A  c)  =  (a  V  b)  A 
( a  V  c),  a  A  (b  V  c)  =  (a  A  b)  V  (a  A  c)  (associativity);  3.  a  V  0  =  a  (universal 
bound);  4.  a  A  1  =  a  (universal  bound);  5.  a  V  a  =  1,  a  A  a  =  0  (complementar¬ 
ity)  [30—32],  By  Appendix  D.(/’(W');  U,  n,  ,  0,  IV)  and  (P(W)\  a,  O,  ,  0.  W) 
are  Boolean  algebras. 

(5)  A  poset  is  a  set  with  a  relation  satisfying:  1.  the  reflexivity;  2.  the 
antisymmetry;  and  3.  the  transitivity  [30-32],  The  relation  c  satisfies  these  three 
conditions:  1.  X  c  X  for  all  X  6  P(  W);  2.  if  X  c  R  and  R  c  X.  then  X  =  R:  and 
3.  if  X  c  R  and  R  c  Q,  then  X  c  Q. 

(6)  A  complete  lattice  is  a  poset  (S;  <)  in  which  every  subset  of  S  has  a  sup 
(the  least  upper  bound)  and  an  inf  (the  greatest  lower  bound)  [30-32],  In  the  algebra 
(/>(lf');  U,  n,  c),  given  any  subset  of  P(W),  say  { .¥(0)|0  e  0}  (0  is  the  index 
set  of  the  elements  in  this  subset  of  P(  W )),  we  have 

sup  =  |J  X(0) 

inf  =  D  X{0). 

Thus,  several  standard  algebraic  structures  and  their  properties  can  be  directly 
implemented  and  used  in  BIA. 

5.  IMPLEMENTATION  ON  OPTICAL  CELLULAR  LOGIC  PROCESSORS 

To  map  algorithms  into  architectures,  we  first  use  an  algebraic  approach  for 
describing  a  cellular  image  processor.  Then  we  design  the  digital  optical  cellular 
image  processors  (DOCIPs)  and  their  optical  implementation.  Figures  15  and  16 
show  an  optical  concept  for  the  DOC1P  implementation.  The  optical  system  can 
realize  an  array  of  cells  by  a  spatially  parallel  2D  array  of  optical  binary  gates  and 
performs  interconnections  of  these  gates  by  an  optical  hologram.  The  DOCIPs  are: 

•  The  DOCIP-array  (Fig.  15),  a  cellular  array  processor,  which  uses  optical 
parallelism  to  map  an  inherently  2D  parallel  data  structure  to  a  2D  nearest-neigh¬ 
bor  connected  cellular  computer  in  a  simple  and  direct  way;  its  performance  is 
primarily  limited  by  its  0(1)  interconnectivity,  and 

•  The  DOCIP-hypercube  (Fig.  16),  a  2-dimensional  cellular  hypercube,  which 
uses  optical  parallelism  and  3D  global  interconnection  capabilities  to  implement  a 
hypercube  interconnection. 
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N  x  N  Output  S »de  of  Array  of  Colls 
(implemented  by  optical  gate  array) 


NxN  Input  Side  of  Array  of  Cells 
(implemented  by  optical  gate  array) 


Fig  15.  An  optical  4-connected  or  8-connectcd  cellular  array  (DOCIP-arrav4  or  D0CIP-array8). 
Imaging  optics  are  omitted  for  clarity.  Each  cell  connects  with  its  four  nearest  cells  and  itself  by  optical 
3D  free  interconnection  The  optical  hologram  provides  both  intra-cell  and  inter-cell  interconnections 
The  input  and  output  sides  of  the  optical  gate  array  are  interconnected  by  an  optical  feedback  path  and 
arc  shown  separately  for  clarity. 


Here,  the  2-dimensional  cellular  hypercube  is  used  to  match  the  structure  of  a 
2-dimensional  image  and  further  improve  the  communication  ability  of  a  cellular 
array.  Ideally,  a  conventional  hypercube  (Fig.  17)  increases  the  interconnectivity  to 
Oflog  N)  for  N  computation  cells;  however,  when  laid  out  in  2-dimensional  space, 
its  interconnection  patterns  are  not  space  invariant;  such  spatial  invariance  is 
desirable  for  image  processing  and  for  simple  implementation  in  optical  hardware. 
To  include  this,  we  increase  the  interconnections  to  make  a  2-dimensional  cellular 
hypercube  (Fig.  18).  The  cellular  hypercube  introduces  a  symmetrical  positive  and 


(•"ipiemoniod  by  optical  gate  array) 


(implemented  by  optical  gate  array) 


Fig.  16.  An  optical  4-directed  or  8-directcd  cellular  l.ypcrcube  (DOClP-hypcrcube4  or  DOCIP- 

hypcrcubcS)  Each  cell  connects  with  cells  in  the  4  directions  or  R  directions  at  distances  1,2.4.  R . 2* 

from  it  by  optical  31)  free  interconnection 
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Fig.  17.  A  conventional  hvpercube  (4-cube)  laid  out  in  2-dimcnsional  space  Its  interconnections 
have  no  spatial  invariance 


Connections  in  the  4-directed  cellular  hypercube 
— — )  Connections  in  the  8-directed  cellular  hypercube 


Fig  18.  A  2-dimensional  cellular  hvpercube  —  DOCIP-hypcrcubc.  Each  cell  is  interconnected  with 
other  cells  having  a  relative  one  bit  difference  in  coordinate  label  in  positive  or  negative  <  and  i 
directions  to  achieve  a  spatially  symmetric  and  invanant  interconnection  pattern.  Onlv  connections  from 
the  central  cell  are  shown;  all  cells  arc  connected  identicallv  so  the  resulting  interconnections  arc  space 
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negative  index  so  that  each  cell  is  connected  with  cells  having  a  relative  one  bit 
difference  in  coordinate  label  in  positive  or  negative  a  and  r  directions;  the 
numerical  difference  of  addresses  of  connected  cells  is  nonzero  in  at  most  one  bit 
(42], 

5.1.  Algebraic  Description 

Having  defined  cellular  automata  and  the  implementation  requirements  of  BIA. 
we  describe  the  DOCIP  in  an  algebraic  way: 

Definition  of  Cellular  Automata.  A  cellular  automaton  is  an  algebra 
A  =  (S',  F ,  Nc),  where  S  is  the  state  space  which  is  a  set  of  states.  F  is  a  family  of 
transition  functions,  and  jV  is  the  neighborhood  configuration. 

Constraints  of  Implementing  BIA. 

1.  5  3  P(W) 

2.  F  D  {  ©,  U,  } 

3.  NcZ>IUAuA~lUBuBl  (or  ,V  d  A  U  A  1  U  B  U  B  ').  where 
“  3  ”  means  “contains.” 

Thus,  in  terms  of  cellular  automata,  the  DOCIPs  have  to  satisfy  the  above 
constraints  for  realizing  BIA.  For  storing  input  images  and  temporary  results  in  a 
more  flexible  way,  the  DOCIPs  utilize  three  memory  modules  and  share  the  same 
algebraic  structure  (except  the  neighborhood  configuration): 

DOCIP  =  {P(W  x  W  x  W),  ®.  u,  \  Nc). 

where  “  X  ”  denotes  cross  product  and  Nc  can  be  one  of  the  following  four  types: 

1.  DOCIP-array4.  Each  cell  connects  with  its  four  nearest  neighbors  and  itself. 

i.e„ 

=  IUA(JA~iL)BUB  1 . 

2.  DOCIP-array8.  Each  cell  connects  with  its  eight  nearest  neighbors  and  itself, 
i.e.. 

t 

*arr.vK=  U  *B'. 

3.  DOCIP-hypercube4.  Each  cell  connects  with  itself  and  those  cells  in  the  four 
directions  at  distances  1,2, 4, 8,  —  2k  from  itself,  i.e.. 

^hypcrculK4  “  U  (A'UB‘). 

i-O.  *1.(2.  .  •  2‘ 

where  k  is  sufficiently  large  for  the  connections  to  traverse  the  entire  array  of  cells. 

4.  DOCIP-hypercube8.  Each  cell  connects  with  itself  and  those  cells  in  the 

eight  directions  at  distances  1,2, 4, 8 . 2‘  from  itself,  i.e.. 


N 


hypcruuln-H 


u 


(  A'  U  B‘  U  A'B1  U  A'B  '). 
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5.  J.  General  Description 

From  the  above  algebraic  ties  .ription.  the  DOCIPs  have  the  same  algebraic- 
structure  and  differ  only  in  their  neighborhood  configurations  Nc.  Thus,  they  share 
the  same  architecture  as  shown  in  Fig.  19,  but  have  different  configurations  of  the 
reference  images  £,,  depending  on  the  optical  interconnection  network  which 
defines  the  neighborhood.  In  practical  applications,  a  larger  reference  image  R  can 
be  generated  from  a  set  of  smaller  reference  imagefs)  £,  by  a  “sequential  dilation.” 
If  it  is  possible  to  decompose  R  into  a  sequence  R  =  E,  ©£,©■•  •  ©£*,  then 

X  ®  R  =  (  ■  •  •  ((  X  ©  £, )  ©  £2 )  ©  •  -  -  ©  Ek ). 

This  decomposition  may  not  exist;  in  which  case  R  can  always  be  decomposed  as 
R  -=  /?,  U  R2  U  •  •  •  URk,  and  then 

X  @  R  =  (  X  ©  £,)  U  (  X  ©  R2)  U  •  ■  -  U(  A'  ©  Rk ). 

where  each  R  can  be  composed  from  the  smaller  reference  images  £,. 

Basically,  the  proposed  DOCIP  as  shown  in  Fig.  19  is  a  cellular  SIMD  machine 
and  consists  of  an  array  of  cells  or  processing  elements  (PEs)  under  the  supervision 
of  a  control  unit.  The  control  unit  includes  a  clock,  a  program  counter,  a  test  and 
branch  module  for  feedback  control,  and  an  instruction  decoder  for  storing  instruc¬ 
tions  and  decoding  them  to  supervise  cells.  The  array  of  cells  includes  a  1  X  3  X  N 2 
bit  destination  selector,  three  /fxJVxl  bit  memories  for  storing  images,  a 
memory  selector,  and  a  dilation  unit. 

The  DOCIP  shown  in  Fig.  19  operates  as  follows:  (1)  a  binary  image  (A/  X  N 
matrix)  is  selected  by  the  destination  selector  and  then  stored  in  any  memory  as  the 
instruction  specifies;  (2)  after  storing  the  images  (1  to  3  N  X  N  matrices),  these 


Image  Data  (NxN  Matrix)  Control  Unit 


I  Ur  19  A  digital  optical  cellular  image  processor  (DOCIP)  architecture  one  implementation  of 
I'Hiai  v  image  algebra  (BlA)  The  fXXTP-array  requires  9  (or  5)  control  bits  for  reference  image  /:] .  The 
1  H.)C | P-hvpcrcu be  requires  O(log  A/)  control  bits  for  reference  image  /., 
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images  and  their  complemented  versions  are  piped  into  the  next  stage,  which  forms 
the  union  of  any  combination  of  images;  (3)  the  result  is  sent  to  a  dilation  where  the 
reference  image  specified  by  the  instruction  is  used  to  control  the  type  of  dilation; 
(4)  finally,  the  dilated  image  can  be  output,  tested  for  program  control,  or  fed  back 
to  step  (1)  by  the  address  field  of  the  instruction. 

The  entire  system  can  be  realized  by  an  optical  gate  array  with  optical  3D 
interconnections  [25-28],  It  should  be  noted  that  current  optical  technology  has 
implemented  only  arrays  of  moderately  large  numbers  of  gales  (500  x  500)  at  very 
slow  (-  ms)  switching  speeds,  and  alternatively,  arrays  of  small  numbers  of  gates 
(2  x  2  to  6  X  6)  at  fast  switching  speeds  (0.1  gs-50  ps)  [43,  44],  Current  ongoing 
research  in  a  number  of  laboratories  looks  promising  in  eventually  providing  the 
needed  arrays  of  large  numbers  of  gates  with  reasonably  fast  switching  speeds. 
Alternatively,  control  of  the  DOCIP  can  be  easily  realized  by  using  an  electronic 
host  instead  of  the  optical  control  unit,  since  control  of  SIMD  systems  is  primarily  a 
serial  process.  The  trade-off  is  a  possible  inefficiency  in  the  interfaces  between 
electronic  and  optical  units.  Because  of  this  the  all-optical  approach  may  be 
preferable  in  the  long  term.  To  efficiently  utilize  optical  gates,  they  can  be  intercon¬ 
nected  with  a  2D  optical  multiplexing  technique  in  which  a  common  controllable 
mask  is  used  for  all  cells.  The  optical  multiplexing  technique  has  the  following 
advantages:  (1)  the  DOCIP  will  no  longer  require  the  broadcasting  of  instructions 
from  the  control  unit — instead  all  cells  fan  their  outputs  into  a  common  controlling 
mask  pixel;  (2)  it  will  reduce  the  number  of  gates;  and  (3)  each  cell  has  a  simple 
structure — essentially  containing  only  a  3-bit  memory  with  inverting  and  noninvert¬ 
ing  outputs,  and  a  multiple-input  OR  gate  for  dilation  [45], 

To  avoid  the  well-known  drawbacks  of  conventional  computers  based  on 
von  Neumann  principles  [23,  38],  the  machine  in  Fig.  19  has  one  instruction  which 
implements  the  three  fundamental  operations  of  BIA  along  with  fetch  and  store. 
This  design  uses  the  parallelism  of  optics  to  simultaneously  execute  instructions 
involving  all  N2  picture  elements. 

This  single  instruction  has  the  format 

(e,  </,.  d2.  .v,,  s2 . sh ,  n,,  n2 . nk.  yj.  y,,  a,,  a2 . a,,  bx,  b2 . />,). 

where  k  is  determined  by  the  chosen  neighborhood  configuration  A',.  The  DOCIP- 
arrav  requires  k  =  5  or  k  =  9  bits  for  controlling  reference  image  R  at  a  clock  cycle 
and  the  DOCIP-hypercube  requires  k  =  0(log  N)  for  N  cells,  and  /  defines  the 
maximum  length  of  a  program:  2'.  The  functions  of  these  11  +  k  +  21  instruction 
codes  are: 

•  <■  is  used  to  select  the  image  from  the  input  or  from  the  feedback; 

•  <7|.  d2.  and  d2  are  used  to  select  the  destination  memory  for  storing  the 
image; 

•  j,.  v, . ,vh  are  used  to  select  the  output  from  the  memory  elements; 

•  n2, —  nk  are  used  to  control  the  neighborhood  mask.  i.e..  to  supply  the 
reference  image; 

•  y,  and  y2  are  used  to  flag  an  absolute  jump  or  conditional  jump; 

•  a,,  a2 . a,  are  the  address  for  jump;  and 

•  A,.  b2, . . . ,  bj  are  the  address  of  the  instruction. 
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TABLE  2 

Cellular  Image  Processor  Execution  Times  for  N  X  N  Image  Data 


Technology 

Conventional 

DOCIP- 

DOCIP- 

array 

array 

hypcrcube 

Operation 

(electronics) 

(optics) 

(optics) 

Local 

operations 

0(1) 

0(1) 

0(1) 

Global 

O(N) 

O(N) 

0(log  N ) 

operations 

Communication 

or  0(N2) 

O(N) 

0(1) 

0(1) 

PE  «-*  Main  Memory 
Input/Output 

or  0(S2) 

O(N) 

0(1) 

0(1) 

or  0(N2) 

Note.  Table  2  roughly  compares  the  execution  time  for  the  conventional  electronic 
array  processor,  the  DOCIP-array,  and  the  DOCIP-hypcrcube. 


Order  of  magnitude  execution  times  for  image  processing  on  the  DOCIP  ma¬ 
chines  and  on  the  conventional-array  processors  are  compared  in  Table  2.  In 
contrast  with  the  DOCIP-array,  the  DOCIP-hypercube  increases  the  interconnec¬ 
tion  complexity  to  OOogA),  but  is  able  to  perform  many  global  operations  in 
O(logjV)  time.  Comparing  with  the  conventional-array  processors  having  serial  or 
yV-parallel  input/output,  the  DOCIP-array  will  have  the  same  order  of  performance 
in  local  and  global  operations  but  will  be  improved  in  input/output  performance, 
and  in  principle  could  be  as  low  as  0(1)  in  I/O  operations.  The  DOCIP-hypercube 
will  not  only  be  improved  in  input/output  performances  but  also  in  global  opera¬ 
tions.  With  external  memory,  it  can  also  be  demonstrated  to  be  general  purpose  in 
the  sense  of  the  ability  of  simulating  any  Turing  machine.  One  important  feature  in 
the  design  of  the  DOCIP-array  and  DOCIP-hypercube  is  that  optical  3D  free 
interconnection  capabilities  can  be  used  to  reduce  the  cell  hardware  requirements  as 
well  as  solve  the  global  connection  and  I/O  problems  which  are  difficult  to  solve  bv 
planar  VLSI  technology. 

6  A  PROGRAMMING  EXAMPLE-SIZE  VERIFICATION 

B1A  and  DOCIP  architectures  can  have  many  applications  in  character  recogni¬ 
tion.  industrial  inspection,  medical  and  scientific  research.  Since  B1A  is  able  to 
implement  morphological  operations  efficiently,  the  DOCIP  machines  can  efficiently 
analyze  the  shape  and  connectivity  of  regions  as  well  as  measure  their  size;  they  also 
have  the  potential  to  accomplish  any  image  transformation.  Here  we  illustrate  the 
programming  of  the  DOCIP  machines  by  a  simple  size  verification  algorithm: 

•  Problem.  Given  an  input  image  X  with  31  X  31  pixels  (Fig.  20)  which 
contains  some  square  objects  Aj.  we  want  to  preserve  those  square  objects  .V,  which 
satisfy  the  condition. 


size  of  R  <  size  of  Xt  <  size  of  Q 

where  R  and  Q  are  reference  images  as  shown  in  Fig.  21.  Other  objects  will  be 
eliminated  in  the  output  image  Y.  The  expected  output  image  Y  is  shown  in  Fig.  22. 
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Fig.  20.  The  input  image  X. 


Reference  Image  R 


Reference  Image  Q 


Fig  21.  The  reference  images  R  and  Q. 


I  lf.  22  The  expected  output  image  > 


I 
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•  Algebraic  expression  for  the  size  verification  using  band  pass  morphological 
filtering  (Theorem  3.2), 

(T  ©  R  ©  R  )  U  (X  ©  Q  ©  Q), 

where  R  =  R  and  Q  =  Q  in  this  special  example. 

•  Algorithm  for  the  DOCIP-array8. 

(T  ©  £3  ©  £3 )  U  (T  ©  £4  ©  £4), 

where  £  (Fig.  23)  is  the  allowed  reference  image  with  the  maximum  size  at  a  clock 
cycle  in  the  DOCIP-arrayS,  the  reference  images  R  =  £3  =  £  ©  £  ©  £  and  Q  = 
£4  =  £©£©£©£=£©£. 

The  DOCIP-array8  requires  13  steps  to  complete  this  algorithm,  its  program 
(instructions)  is  in  the  following: 

Assume  start  with  X  At,  (  X  stored  in  Mcmoryl) 

1  At,®  A  -  At,  (  =  ,Y  ©  A  ) 

2  At,  ®  A  —  At,  (=  Y®  /  .  ') 

3.  at,  ©  a  -  w,  (=  x  ®  a') 

4  At,  ®  A  -  Afj  ( -  A'  ©  A4 ) 

5.  A/],  ffi  A  -  At,  (  =  X  ©  £’  ©  A) 

6.  At,  ®  A'  -  At2  (  =  X  ©  A'  ©  a2 ) 

7  At,  ®  A  -  At,  (  =  X  C  A1  ®  /■;’ ) 

8  Atj  ®  A  -  At,  (  =  Y  ®  A4  ©  A) 

4  At,  ©  E  -  At,  (  =  X  ®  A  4  ®  A2 ) 

10  At,  ®  E  ->  At,  (  =  X  ©  A4  ©  A') 

11  At,  ®  A  -  At,  (  =  X  ©  A4  ®  A4) 

12.  AA  U  At,  -*  At,  |  =  (  A’  ©  A’  ©  A’)  u  (I®  A4®  A4)) 

1 3  End  with  Afj  -•  V  |  =  (  Y  ©  A’  ®  A1 )  U  (  A'  ©  A4  ®  A4 )  j 


Reference  Image  E 
DOCIP-array8  Instruction  Code  for  E 

L1  11  IU.1 11 .1!  11  r  1 

DOCIP-hypercube8  Instruction  Code  for  E 

tOjgjQJQlOlOlOlOIOlOPlOlO  10 10 10 to  10  P  IQ  K)  P  10  P 11  II  H  II  II  II  II  II  II  I 
for  cells  for  cells  for  cells  for  cells 

at  distance  8  at  distance  4  at  distance  2  at  distance  i'0 


I  u»  23  An  allowed  reference  i.  .age  /:  at  a  clock  cycle  in  the  DOCIP-arravX  (also  allowed  m 
DO(  IP  hvpercubeX)  and  its  corresponding  (or  33)  bits  in  instruction  (n{n:  nk)  for  controlling  the 

neighborhood  mask  (i.c .  the  reference  ini.igc  for  the  dilation) 
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OOCIP-hypercubeS  Instruction  Code  for  P 

IQlQtQlQlQ  10 10  10 10 10  JQJQ1Q  IQ  IQ  IQ  11  11  II  II  11  11  11  11  H  it  11  11  11  11  II  U  11  I 
for  cells  for  cells  for  cells  »or  cells 

at  distance  8  at  distance  4  at  distance  2  at  distance  1/0 

Fig.  24  An  allowed  reference  image  P  at  a  clock  cycle  in  the  DOCIP-hypcrcubc8  (not  allowed  in  the 
DOClP-arravK)  and  its  corresponding  33  bits  (assume  31  X  31  cells)  in  instruction  (n{n2  •  •  wJ3)  for 
controlling  the  neighborhood  mask  (i.e.,  the  reference  image  for  the  dilation). 


•  Algorithm  for  the  DOCIP-hypercube8. 

(.V  ©  /'  ©  E  ®  P  W  E  j  U  (x  ®  P  @  E2  ©  P  ©  E2). 

where  P  (Fig.  24)  and  E  (Fig.  23)  are  allowed  reference  images  at  a  clock  cycle  in 
the  DOCIP-hypercubeB,  the  reference  images  R  =  E3  =  P  ©  E  and  Q  =  E*  = 
P  ©  E2  =  R  ©  E. 

The  DOC'IP-hypercube8  requires  10  steps  to  complete  this  algorithm,  its  program 
(instructions)  is  shown  in  the  following: 

Assume  start  with  .V  -•  A/,  (X  stored  m  Memory  1) 

i  m]  ©  r  -  M;  (=  .v©  r) 

2.  M,  ©  A.'  -  M,  I  »  X  ®  P  ffi  A.) 

3.  m2  ®  a;  -  M,  (*  X®  P  ©  E1) 

4  «]©/*->  m,  ( =  xe  p  ©  e'-  ©  r) 


♦ 


Fig  25  The  locations  of  the  desired  objects  in  the  output  image  V 
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5.  m,  ®  i  —  a/,  (  =  .v  ®  /’  ©  i:'  ®  /■  ®  i: ) 

6  M\  ®  /*  -  M,  (  =  X  ®  P  ®  ®  /’) 

7.  M,  ffi  2  ->  W,  (  =  V  ®  P  ®  '  ®  /’  ®  /. ) 

8.  M,  ®  2  -  A/,  (  =  A  ®  P  ®  2  2  ®  P  ®  E2) 

9.  A/j  U  My  -•  W,  ^  =  (  A'  ®  P  ®  2  ®  P  ®  p)  U  (  X  ffl  P  ©  2  2  ®  P  ffi  E2  )  ) 


—  ..  ( 


10.  End  with  My-*  Y  ^  *  yX  ©  P  ©  E  ©  E 


e)  u  (x  ® 


p  ®  e-  ®  p  ®  i- 


The  above  programs  can  be  translated  into  the  machine  instruction  codes  directly. 
If  we  want  to  detect  the  geometric  centers  (locations)  of  the  desired  objects,  then  we 
can  use  a  sequential  thinning  to  achieve  the  homotopic  skeleton  (Theorem  3.4)  (Fig. 
25). 


7.  CONCLUSIONS 

We  have  summarized  digital  optical  cellular  image  processing,  including  binary 
image  algebra  (BIA)  and  the  DOC1P  architectures.  BIA  suggests  a  unified  theory  of 
parallel  binary  image  processing  for  developing  parallel  algorithms/languages  and 
can  be  generalized  to  grey-level  images.  Applications  of  BIA  in  binary  image 
processing  are  illustrated.  The  DOCIP  architectures,  especially  the  DOCIP-hyper- 
cube,  utilize  the  parallel  communication  and  global  interconnection  capabilities  of 
optics  for  avoiding  communication  bottlenecks  and  matching  BIA  parallel  algo¬ 
rithms  efficiently.  A  size  verification  algorithm  is  used  to  demonstrate  the  program¬ 
ming  of  these  1 -instruction  DOCIP  machines.  Overall,  BIA  is  a  simple,  precise,  and 
complete  algebraic  theory  of  binary  images;  the  DOCIP  machines  have  simple 
organization,  low  cell  complexity,  and  potentially  fast  processing  ability. 


APPENDIX  A 

Proof  of  lA-mma  2.1.  We  start  with  the  case  of  X  =  R  and  then  the  case  of 
*  R. 

Case  1.  X  =  R,  i.e.,  R  =  X.  We  want  to  prove 

(X  ®  x)  u  {x  ®  x)  u  I  =  I  »  (x  ®  x)  u  (x  ®  x)  n  i  =  i. 

1.  Claim  I  c  (X  ®  X)  u  (X  <$  X)  CM 

<->  (0. 0)  e  (  A'  ©  X )  U  (  X  ffi  X ) 

«  ((),  0)  <2  (  A^ffi  X)  U  (  X  ®  X ) 

«-»  [(0,0)  <2  (X©  A")]  A  [(0,0)  <2  (X  ©  i)): 

(a)  Claim  (0.0)  <2  (  A’  ffi  A).  Assume  (0,0)  e  (  X  ffi  A  )  __ 

—  (0,0)  G  {(a  +  ( -x),  b  +  ( -y))  6  W\(a,  b)  e  A'.  (  ~.v.  -y)  e  A  } 

-  (0,0)  G  {(«  -  x,b  -  y)  G  W\ (a,  b )  <2  X,  (  v,  y)  G  A  } 

-♦3  (a  -  x,  h  -  y)  =  (0,0)  where  (a,  b)  ffi  X,  (  v,  y)  G  A' 

•-*  3  (  a.  y)  =  («,/>)  where  (a,  b)  ffi  X,  (.v,  y)  g  X 

which  is  impossible,  since  (x,  y)  =  (a,  b)  ffi  X  contradicts  with  (  a.  y)  ^ 
(a,  b)  g  X.  Therefore,  the  assumption  is  wrong,  we  have  that  (0.0)  <2 
(  X  ffi  X ). 
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(b)  Claim  (0,0)  £  (  X  ©  A').  Assume  (0,0)  e  (X  ®  X) 

^  (0  0)  £  {(.v  +  (-a),  y  +  (-6))  6  fF|(x,  y)  £  X,  (- a ,  -ft)  £ 

*} 

->  (0,0)  G  {(.V  -  a,  y  -  h)  G  M^|(jc,  y)  G  X,  ( -  a, -ft)  €  X) 

-*  (0,0)  G  {(x  —  a,  y  —  h)  f=  W\(x,  y)  e  X,  (a,  ft)  £  X} 

-»  3  (x  -  u,  y  -  b)  —  (0, 0)  where  (a,  ft)  £  X,  (x,  y)  £  X 

-*  3  ( x ,  y)  =  (a,  ft)  where  (a,  A)  £  X,  (x,  y)  G  X 

which  is  impossible,  since  (x,  y)  =  (a,  b)  £  X  contradicts  with  (x,  y)  ~ 

(a,  b)  g  X.  Therefore,  the  assumption  is  wrong,  we  have  that  (0,0)  £ 

(X®  X). 

By  (a)  and  (b),  we  have  [(0,0)  £  ( X  ffi  X)j  A  [(0,0)  £  (X  ffi  X)],  i.e.. 


/  c  (X  ®  X)  U  (X  ®  X). 

We  also  know  /  c  /,  then  we  have 

/c  (X  ffi  X)  U  (Xffi  X)  n  /. 

2.  Claim  (X©X)u(X®X)u/c/.  Since  /  c  /,  it  implies 


(x®x)u(xox)u  /  =  (X®  X)  U  (X®  X)  D  /  c  /. 
From  (1)  and  (2),  we  have 


/  c  (x®  x)  u  (x®  x)  n  i 


(X®  x)  u  (x®  x)  u  /  =  (x®  x)  u  (x®  x)  n  /  c  /. 


Thus,  by  the  equivalence  of  sets,  we  have  (XffiX)u(XffiX)n/  =  /. 
Case  2.  X  *  R,  i.e.,  R  *  X.  We  want  to  prove 

(T®  R)  U  (X~®R)  U  f  =  0  »(jf®/l)u(X®R)ni  =  0. 


1.  Claim  l  <Z  {X®  R)  U  (X©  R) 

<-*  (0,0)  £  ((X  ©  R)  U  (X  ffi  R) 

«-»  (0, 0)  g  ( X  ©  R)  u  ( A'  ©  R ) 

«-*  (0,0)  g  (X  ©  R)  v  (0,0)  g  (A  ©  R): 

Now  we  assume  (0,0)  £  (A"  ffi  R)  A  (0. 0)  £  ( X  ©  R): 

(a)  If  (0,0)  £  (X  ffi  R) 

-  (0,0)  £  {(a  +  k,  b  +  /)|(a,  b)  £  XJk,  /)  £  R } 

-  (a  +  k.  ft  +  /)  *  (0,0),  V  (a.  ft)  £  X,  V  (k.  /)£« 
-*  (a,  /j)  £  (-k,  -/),  V  (a,  ft)  £  A',  V  (k,  /)  G  R 

V  (k.  /)  G  R,  3  (a,  ft)  G  X.  (a,  ft)  =  (-k,  -/) 

-  V  (  -k.  -/)  G  R,  3  (a.  ft)  G  A',  (a,  ft)  =  (-k,  -/) 
-*  V  (/.  y)  6  R.  3  (a,  ft)  G  X,  (<\  ft)  =  (/,  y) 

-((/../)  g  R)  -((/,  y)G  X) 

-Xc  A'. 
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(b)  If  (0, 0)  €  (  A  ©  R ),  then  A  c  R.  Since  the  dilation  operation  is  commu¬ 
tative.  by  interchanging  the  variables  X  and  R  and  applying  the  same 
procedure  as  (a),  we  have  X  c  R. 

2.  By  the  above  (a)  and  (b),  we  have  X  =  R  which  contradicts  with  X  *  k. 
Thus,  the  assumption  is  wrong,  and  we  get 
(0.0)  e  (X  ®  R)  v  (0,0)  e  (X  ®  R) 

«-*  /  c:  ( A  ®  /?)  u  ( A  ©  R) 

^(X®R)u{x®R)ni  =  0 
*->(X®R)U(X®R)ul  =  0. 

Hence,  by  Cases  1  and  2,  we  have  shown  that 

(Y®~R)  U  (X  ®  R)  LI  /  =  /7  if  A  =  R, 

l  0  otherwise. 

APPENDIX  B 

Proof  of  Theorem  2.1.  Consider  any  image  transformation  (general  case), 

(Xx->Ax 

I  ^2  ^  -> 

T:  /  .  - 

where  A,  e  P(  IV),  A,  e  P(IV),  /  =  1,2,...,/. 

If  we  choose  R ,  =  X,,  Q,  =  Ar  i  =  1,2,...,/  and  use  Lemma  2.1  and  some 
properties  of  the  dilation  (i.e„  /®A=Aand0©A=0).  then  we  have 

/ _ _ 

T{X)-  \J{(X®  Rl)\j{X®R'l)u  I  ®q\. 

1  -  1  ' 

Since  some  images  X,  may  map  into  the  null  image  0  for  a  given  image 
transformation,  by  Lemma  2.1  we  have  that 

i  _ _ _ 

T(X)  =  |J  {( X®  R,)  u  ( A©  rt])  U  /  ©  Q 
,  -  1  ' 


where  k  <  /,  /  =  #(P(IV))  is  the  cardinality  of  P(W). 

APPENDIX  C 

Proof  of  Theorem  2.2.  This  can  be  shown  in  a  very  straightforward  way.  Any 
image  is  a  set  of  image  points  and  is  the  union  of  point  images  (consisting  of  one 
and  only  one  image  point).  A  point  image  {</.  j)}  can  be  written  as 

{(/,  j)}  =  A'B'. 

Hence,  the  union  of  all  point  images  which  are  contained  in  A'  is  the  image  A'.  For 
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example,  an  image  X  =  {(2,0),  (1,  -1),  (  —  1,2)}  is  denoted  by 

X  =  A2  U  AB  '  U  A  lB2. 


APPENDIX  D 

I.  Properties  of  Complement  and  Difference 

The  complement  ,  a  unary  operation,  is  decreasing  and  shift  variant  (considering 
the  outside  of  an  image).  The  difference  X/R,  a  binary  operation,  is  increasing  (but 
decreasing  with  respect  to  the  reference  image  R),  antiextensive  with  respect  to  X, 
and  shift  variant  (the  reference  image  R  is  fixed  once  it  is  given).  Note  that  the 
difference  operation  is  not  commutative,  not  associative,  and  not  distributive  over 
other  operations.  Furthermore,  the  difference  operation  is  more  complicated  than 
the  complement.  Hence,  it  is  preferable  to  employ  the  complement  as  a  fundamental 
operation,  but  not  the  difference.  The  major  properties  of  the  complement  and  the 
difference  are  listed  in  the  following: 

1.  X  =  W/X 

2.  X/R  =  X  n  R 

3.  ¥  =  W 

4.  W  =  0 

5.  X  =  X  (idempotent  for  twice  complements) 

6.  X/  0  =  X  (idempotent  for  a  given  reference  image  R  -  0) 

7.  X/X  =  0 

8.  X  c  Y  *->  Y  c  X  (decreasing) 

9.  X  c  Y  <-*  X/R  c  Y/R  (increasing) 

10.  X/R  c  X  (antiextensive) 

11.  X  c  R  <->  X/R  =  0 

12.  xTTTf-  X  u  ^ 

13.  X~u7?=  Xn  R 

14.  X  n  X  =  0 

15.  X  u  X  =  W 

16.  TWR=  X  e  R.  where  R  =  {(-*.  ->)|(.v.  y)  e  /?} 

17.  X  ®  /?=  X  ©  R,  where  R  =  {( -  x,  -y)|(.v,  y)  6  /?}. 

2.  Properties  of  Union  and  Intersection 

The  union  U,  a  binary  operation,  is  increasing,  extensive,  shift  variant,  idempo¬ 
tent,  commutative,  associative,  and  distributive  over  intersection.  The  intersection 
n,  a  binary  operation,  is  increasing,  antiextensive,  shift  variant,  idempotent.  com¬ 
mutative,  associative,  and  distributive  over  union.  The  major  properties  of  the  union 
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and  the  intersection  are  listed  in  the  following: 


1. 

A 

u 

0  =  X 

A' 

n 

0  =  0 

2. 

X 

u 

X  =  X 

X 

n 

X  =  X 

3. 

X 

u 

R  =  R  U  X  (commutative) 

X 

n 

R  =  R  n  X  (commutative) 

4. 

X 

u 

(R  U  Q)  =  (X  U  R)  U  Q  (associative) 

X 

n 

(/?fi(?)  =  (Arn/?)n(?  (associative) 

5. 

X 

u 

W  =  W 

X 

n 

W  =  X  (idempotent  for  a  given  reference  image  R  =  W) 

6. 

X 

u 

(R  O  Q)  —  (X  U  R)  n  ( X  U  Q)  (distributive) 

X 

n 

(RUQ)  =  (XC\R)U(XnQ)  (distributive) 

7. 

X 

c 

XU  R  (extensive) 

X 

n 

R  c  X  (antiextensive) 

8. 

X 

c 

f^XU/lc  Y  U  R  (increasing) 

X 

c 

Y  «-»  X  O  R  c  Y  n  R  (increasing) 

9. 

X 

c 

R  X  U  R  =  R 

X 

c 

R  «  x  n  R  =  X 

10. 

R 

c 

X/\QcX-*RUQcX 

X 

c 

RrXcQ->XcRuQ 

11. 

R 

c 

XaQcY-*RUQcXuY 

R 

c 

X/\Q<zY-*RnQ<zXnY. 

3.  Properties  of  Dilation  and  Erosion 


The  dilation  © ,  a  binary  operation,  is  increasing,  extensive  for  a  given  reference 
image  R  which  contains  the  elementary  image  /,  shift  invariant,  commutative, 
associative,  distributive  over  union,  and  possesses  an  identity  which  is  /  The 
erosion  ©,  a  binary  operation,  is  shift  invariant,  increasing  (but  decreasing  with 
respect  to  the  reference  image  R),  and  antiextensive  for  a  given  reference  image  R 
which  contains  the  elementary  image  l.  But,  in  general,  the  erosion  is  not  commuta¬ 
tive,  not  associative,  not  distributive  over  other  operations,  and  does  not  possess  a 
left  identity.  The  major  properties  of  the  union  and  the  intersection  are  listed  in  the 
following: 

1.  X  ®  R  =  R  ®  X  (commutative) 

X  ©  R  *  R  ©  X  (in  general) 

2.  (X®R)®Q=X®(R®Q)  (associative) 
(XOR)eQ*XO(RQQ)(  in  general) 

(X  ©  R)  ©  Q  =  (X  ©  Q)  ©  R 

3.  X®(RUQ)  =  (X®R)(J{X®Q)  (distributive) 

Xe(R'JQ)  =  {XeR)n(XQQ) 

x  e  (R  ®  Q)  =  (X  e  R)  e  Q 

4.  X  (9  !  =  X  =  I  ®  X  (identity) 

X  Q  1  =  X  *  1  Q  X  {\n  general) 
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5.  Xffi0  =  0  =  0©X 

A'  0  0  =  W  *  0  Q  X  (in  general) 

6.  X  c  A'  ffi  R  when  /  c  R  (extensive) 

X  &  R  c  X  when  /  c  R  (antiextensive) 

7.  X<zY*->X®RcY®R  (increasing) 

X  c  Y  «-»  X  G  R  C  Y  Q  R  (increasing) 

8.  RcQ~X®R<zX(BQ 
R<zQ~XQQcXQR 

9.  Xffi(/?n£>)c(Xffi7()n(Xffi(})  (distributive  inequality) 

X  Q  (R  n  Q)Z)  (X  e  R)U  (X  e  Q) 

(Xu  Y)  e  R  o  (x  e  R)  n(Y  e  R) 

(X  ©  R)  ffi  Q  c  (X  ©  R)  ©  £> 

Remark.  “  3  ”  means  “contains.” 

Properties  of  Some  Standard  Operations 

1.  The  symmetric  difference  is  shift  variant  (with  a  fixed  reference  image  R ), 
commutative,  and  associative.  Symbolically, 

(a)  X aR  =  R a  X 

(b)  X  a(  RaQ  )  =  (XaR)aQ 

(c)  Xa0  =  X 

(d)  XaX  =  0 

(e)  XaX  =  W 

(f)  X a  W7  =  X 

(g)  x  n  («a0)  =  (X  n  R)a(X  n  Q) 

(h)  X  U  (RaQ)  *  (XU  R) a( X  U  0)  (in  general) 

(i)  XaR  =  Ta«  ->  X  =  y. 

2.  The  opening  °  is  shift  invariant,  increasing,  antiextensive,  and  idempotent. 
The  closing  •  is  shift  invariant,  extensive,  and  idempotent.  Symbolically, 

(a)  X  •  R  c  X  c  X  •  R 

(b)  Xc  Y  -+  X  °  R  <z  Y°  R 

(c)  Xc  Y  ->  X  ■  R  c  Y  -  R 

(d)  (  X  °  R)°  R  =  X o  R 

(e)  ( X  R)  R  =  X  ■  R. 

3.  The  thinning  is  shift  invariant  and  antiextensive.  The  thickening  is  shift 
invariant  and  extensive.  The  major  properties  are  in  the  following: 

(a)  X@RcXcXQfi 

(b)  Xc  y  -»  X®  /?  c  Y@R 

(c)  xcy-xo«cyo/i. 

(d)  If  R  c  Q  (which  means  R{  c  Qt  and  R2  c  Q2).  then  we  have 

RcQ-+X®R<zX®QcX<zXOQcXOR. 

(c)  X"GT7?  =  X®R\  where  «  =  {/?,.  R2)  and  R *  =  (  R2.  /?,). 
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APPENDIX  1 

Proof  of  Theorem  3.2.  We  can  easily  see  that  (2)  in  Theorem  3.2  is  a  generaliza¬ 
tion  of  (1)  in  Theorem  3.2.  (1)  is  used  for  exactly  matching  shapes  (or  templates) 
with  shift  invariance;  (2)  is  generalized  to  more  general  cases.  For  example,  to 
consider  noise  and  to  have  rotational  invariance,  we  can  choose  the  family  {  R(6)) 
to  incorporate  all  aspect  reference  image  pairs.  In  the  following,  we  prove  (1)  and 
then  (2)  will  follow  from  it  directly.  The  proof  will  demonstrate  the  mathematical 
correspondence  between  boolean  logic  and  BIA.  The  notations  ,v(i.  j)  and  r(i,  j) 
will  be  used  to  represent  the  binary  values  (0  or  1)  of  pixels  at  coordinate  (i.  j)  of 
image  functions  which  correspond  to  the  images  X  and  R  in  BIA  notations. 

First,  let  us  use  the  boolean  logic  XOR  (exclusive  or)  operation,  i.e.. 

x(i,  j)X OR  r(i,  j)  =  (x(t,  j)  A  /•(/,  j))  V  (x(i,  j)  A  f(;.  j)). 

to  achieve  the  pixelwise  comparison,  where  the  outp.it  value  with  “0”  means  that 
“v(t,  j)"  matches  “r(t,  j)"  and  the  output  value  with  “1"  means  that  “a(i.  j)" 
does  not  match  “/■(/,  j)." 

Second,  to  check  the  occurrence  of  the  shape  (defined  by  R  with  M  )  in  the  tested 
image  X  at  coordinate  (/.  j).  we  have  to  shift  the  origin  of  the  shape  to  the 
coordinate  (i,  j)  in  A'.  Then  the  process  of  the  comparison  of  the  shape  and  the 
subimage  in  X  (limited  in  the  mask  M)  and  the  indication  of  “match"  (0)  and  “not 
match"  (1)  will  be  performed  by 

V  (x(i  +  k,  J  +  I)  a  #■(*,/))  v  V  (*(/  +  A,  j  +  /)  A  r(k.D). 

tk.l)<£kf  (k.l)f-. W 

If  the  above  equation  is  considered  as  a  binary  operation  operating  on  two  images 
x (/,  j)  and  r(i.  j),  then  this  operation  is  not  commutative;  in  order  to  achieve  the 
commutativity,  we  change  {k,  I)  with  (  -k,  -/)  and  denote  r(k,  l)  =  r(-k,  -/); 

V  (x(t  -  k.  j  -  I)  A  r(k.l))  V  V  (*(/  -  k,  j  -  I)  A  r(k. /)). 

<  A  ,  / )  G  M  (  k.  / )  6  A> 

If  the  output  value  of  the  above  equation  is  “0,”  then  it  means  that  the  location 
(/,  j  )  of  the  image  A"  has  the  occurrence  of  the  shape  (defined  by  R  and  M );  if  “1,” 
the  shape  does  not  occur  at  (/,  j) 

Third,  let  us  run  over  all  coordinates  < i,  j)  (i.e.,  for  all  (/,  j)  e  W  the  universal 
image)  and  then  the  union  of  those  coordinates  with  value  “0"  would  be  the  answer. 
The  value  “0”  at  a  coordinate  (i,  j)  corresponds  to  the  nuh  image  in  set  notation 
and  the  value  “1”  at  a  coordinate  (/,  j)  corresponds  to  the  point  image  {(/,  j)).  For 
convenience,  in  the  following  we  mix  the  notations  of  boolean  logic  functions  and 
set  notations;  if  the  output  of  a  boolean  logic  expression  is  “0,”  it  represents  the 
null  image  0  ;  if  "1,”  it  represents  the  point  image  {( /,  /)}.  Thus,  we  have 

U  (  V  ( .x  ( /  -  A  .  ,  -  / )  A  r(k,  I))  V 
k.  /)•  m 

V  ( A ( i  -  k,  j  -  /)  A  r(k,  /))j 

(  k ,  !)*.  M  ’ 
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which  is  the  same  as 

U  (  V  (x(i  -  k.  J  -  I)  A  f(k.t)) 

O.  j)<i  »*■'(-*,  -  /je  it 

U  (J  (  V  (x(i  -  k,  j  -  I)  A  r(k,l) 

Since  x(i,  j )  ¥=  0  only  when  (/,  j)  e  A'  and  r(k ,  /)  *=  0  only  when  ( k ,  /)  e  /f,  we 
have 


U  V  (x(/  -  k,  j  - /)  A  r(k, /)) 

(I.  y)e  W'  '  (  -/)eM 

=  {(<,»!(.  -  *.  ;-/)e  X,(kJ)^R) 

=  {(»+Ar.y  +  /)|(i.y)eX(*,/)e^} 

=  *®  «. 

Similarly,  we  have 

U  (  V  (x(i-k'j-l)  Ar((,/))|  ~X9(M/R). 

(». /)e»'A  <-*, -/) eM  ' 

Hence,  if  we  use  “0”  to  indicate  “match,”  we  have 

(A7®  tf)  u  (AT©  (a//*)); 
if  we  use  “1”  to  indicate  “match,”  then  we  have 

(X®  R)  U  (A'©  (M/R)). 

Thus,  the  locations  of  a  shape,  which  is  defined  by  a  nonnull  reference  image  R 
with  a  nonnull  reference  image  (called  mask)  M  and  R  c  M  c  IV,  are  the  image 
points  in  the  following 


(  X  ©  R)  U  ( X  ©  (M/R))  =  (X  <B  R)  U  (X  <B  M  U  R) 

=  (X  e  r)  n  (X  e  (M/R)). 

A  more  intuitive  illustration  is  that  the  foreground  X  should  match  R  by  A'  e  R 
(using  multiple-input  AND  gales  to  examine  the  locations  where  the  l’s  should  be), 
while  the  background  X  should  match  M/R  by  X  0  (M/R).  Combining  both 
results  by  the^ intersection  (AND),  we  then  implement  the  shape  recognition  by 
(  X  ©  R)  n  (  A'  ©  (M/R)).  Replacing  R  by  /?,  and  (M/R)  by  R2,  we  obtain  the  hit 
or  miss  transform  (template  matching)  for  shape  recognition. 
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APPENDIX  F 

Proof  of  Theorem  3.3.  (1)  The  straightforward  way  for  removing  the  “pepper' 
noise  is  the  thinning  operation  X  ®  R4  (or  X  ©  Rg).  Following  this,  we  have 

X®Rt  =  TU  (  T  ©  /  )  U  (T  ©  M4) 

=  t  u  lu  (X  ©  m4) 

=  jfu(jfn  x  ©  m4) 

=  (Tu  x)  n  (Xu  x  s>  m4) 

=  wn  (xuYWJQ 

=  x  u  x  ©  w4. 


(2)  The  straightforward  way  for  removing  the  “pepper”  noise  is  the  thickening 
operation  -V  O  (?4  (or  X  O  (?„).  Following  this,  we  have 

TQ(?4  =  Tu(T©A/4)u(T©7) 

=  x  u  ( t@  a/4)  u  x 
=  tu  (t  ®  m4  n  JP) 

=  (jfujfswJ  n  (jcu 
=  (jfujfeM,)  n  IT 
=  (tut®  a/4). 

(3)  The  straightforward  way  for  removing  the  “salt  and  pepper”  noise  is  to 
take  the  difference  of  T®  Q4  by  T  ®  R4  (or  the  difference  of  T®  (?„  by  T  ©  /?K). 
By  a  similar  procedure  as  above  we  can  achieve  the  desired  result. 

APPENDIX  G 

Proof  of  Theorem  3.4.  To  extract  the  region  whose  sizes  are  between  two 
reference  images  R  and  Q,  the  straightforward  way  is  to  design  a  morphological 
band  pass  filter: 

(X°  R)/(X°Q)  =  ((T0  R)  ©  R)/((  T9  (?)  ©  Q). 

To  obtain  the  locations  of  those  desired  regions,  we  then  perform  the  skelotoniza- 
tion: 


S(((X0R)  ©  R)/{(  T  ©  (?)  ©  (?))  =  S((T©  R)/((T9  Q)  ©  Q)) 

=  t|(T©  R)  U  (t  ©  Q  ©  C>)). 
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Image  algebra  representation  of  parallel  optical  binary 
arithmetic 


Kung-Shiuh  Huang,  B.  Keith  Jenkins,  and  Alexander  A.  Sawchuk 


A  binary  image  algebra  (B1A)  that  gives  a  mathematical  description  of  parallel  processing  operations  is 
described.  Rigorous  and  concise  BIA  representations  of  parallel  arithmetic  and  symbolic  substitution 
operations  are  given.  A  sequence  of  programming  steps  for  implementation  of  these  operations  on  a  parallel 
architecture  is  specified  by  the  BIA  representation.  Kxamples  of  arithmetic  operations  implemented  on  a 
digital  optical  cellular  image  processor  architecture  are  given. 


I.  Introduction 

Digital  optical  systems  hold  the  promise  of  provid¬ 
ing  more  accuracy,  flexibility,  and  programmability 
than  analog  optical  systems,  at  the  cost  of  somewhat 
lower  throughput.12  To  achieve  digital  optical  com¬ 
puting,  there  are  at  least  three  possible  logic  systems: 
residue  logic,3"*5  multilevel  logic,7"10  and  binary  log¬ 
ic.1112  Because  it  is  much  easier  to  make  reliable  two 
level  devices  for  binary  logic  and  only  \0g2k  of  them  are 
needed  to  represent  k  levels,  in  this  paper  we  consider 
only  binary  parallel  optical  computing.  A  digital  opti¬ 
cal  cellular  image  processor  (DOCIP)  architecture 
based  on  binary  image  algebra  (BIA)  has  been  demon¬ 
strated  to  be  very  powerful  in  parallel  binary  image 
processing.1316  This  paper  demonstrates  that  the 
DOCIP  with  BIA  algebraic  techniques  can  efficiently 
perform  parallel  numerical  computations  also. 

Boolean  logic  equations  for  binary  arithmetic  are 
not  well  suited  to  highly  parallel  operations  on  planes 
of  data;  they  do  not  reflect  the  location  of  data  except 
typically  by  a  memory  address.  Here  we  first  seek  a 
software  theory  for  parallel  numerical  computation 
algorithms  that  simultaneously  have  binary  digital  ef¬ 
ficiency  and  the  advantages  of  optical  parallel  process¬ 
ing.  We  have  developed  a  binary  image  algebra 
(BIA),16  built  from  only  three  fundamental  operations 
and  five  elementary  images,  to  serve  as  a  complete 
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unified  systematic  theory  for  binary  pa—’Uel  image 
processing.  Now,  we  show  that  BIA  can  also  be  con¬ 
sidered  as  a  spatial  logic  which  is  a  generalized  parallel 
form  of  Boolean  logic  with  an  additional  parallel  infor¬ 
mation  transfer  ability.  BIA  then  becomes  a  formal¬ 
ism  and  a  general  technique  for  developing  and  com¬ 
paring  parallel  numerical  computation  algorithms  for 
digital  optical  computers.  Previous  discussions  have 
relied  solely  on  pictorial  descriptions  of  parallel  arith¬ 
metic  operations.  BIA  provides  a  rigorous  and  concise 
mathematical  description  of  parallel  operations.  In 
this  paper  we  give  these  rigorous  BIA  descriptions  for 
parallel  addition,  subtraction,  and  multiplication. 

Symbolic  substitution  has  been  considered  as  a 
means  for  implementing  parallel  optical  arithmetic 
operations.17"19  Symbolic  substitution  rules  can  be 
described  as  particular  BIA  image  transformations 
(Sec.  5).20  Three  different  binary  number  representa¬ 
tions  (row-coding,  stack-coding,  and  symbol-coding  as 
originally  described  in  Refs.  17-19)  for  binary  arithme¬ 
tic  in  the  DOCIP  machine  are  developed.  Parallel 
operations  of  binary  addition,  subtraction,  and  multi¬ 
plication  are  derived  by  BIA  and  illustrated  as  exam¬ 
ples.  Parallelism  is  achieved  by  performing  arithme- 
tic  operations  on  many  pairs  of  operands 
simultaneously.  The  carries  for  each  pair  of  operands 
are  essentially  propagated  serially  to  keep  hardware 
complexity  low.21  Thus  speed-ups  close  to  linear,  and 
in  some  cases  equal  to  linear  can  be  obtained.  In  this 
paper  we  consider  only  positive  numbers.  A  suitable 
digital  number  representation  will  easily  provide  for 
negative  numbers  also.  For  example,  two’s  comple¬ 
ment  arithmetic  can  be  performed  with  only  minor 
modifications  to  the  algorithms  and  programs  given  in 
this  paper,  and  with  the  addition  of  one  more  hit  (the 
sign  hit)  to  each  operand  and  result. 

Section  2  gives  a  brief  review  of  BIA  and  the  DOCIP 
architecture.  Section  :(  presents  binary  row-coded 
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arithmetic:  binary  addition  and  binary  multiplication 
(including  a  matrix -constant  multiplication  and  an 
element-element  multiplication).  Section  4  presents 
binary  stack-coded  arithmetic.  Section  5  gives  a  B1A 
representation  of  symbolic  substitution  and  discusses 
binary  symbol-coded  arithmetic.  Section  6  gives  a 
comparison  for  the  above  different  number  represen¬ 
tations.  Binary  subtraction  is  presented  in  the  Ap¬ 
pendices  for  clarity. 

2.  Binary  Image  Algebra  (BIA)  and  DOCIP  Architecture 

2. 1  Review  of  Binary  Image  Algebra 

Binary  image  algebra  (BIA),  extending  from  mathe¬ 
matical  morphology,’-  is  a  synthesis  of  Boolean  logic, 
set  theory,  and  image  processing.  We  give  here  a  very 
brief  summary  of  BIA.  Details  are  contained  in  Ref. 
16. 

A  binary  digital  image  is  usually  defined  as  a  func¬ 
tion  /  mapping  a  spatially  sampled  set  of  grid  points 
U,v)  of  an  orthogonal  coordinate  system  onto  the  set 
composed  of  two  elements:  1  and  0.  However,  it  will 
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Fig.  L.  Example  of  fundamental  operations:  complement—, 
union  u,  and  dilation  ®. 


X  =  {(x,y)l(xvv)  €  W  a  (x,y)  #  X\.  (2) 

(b)  Union  of  two  images  X  and  R : 

X  u  R  =  {(x,>-)|(x,y)  e  X  v  (x,.y)  e  Kj.  (.‘0 

(c)  Dilation  of  two  images  X  and  R : 
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l(x,  +  x,,.v,  +  y,>  6  Wl(x,,y,)  e  ,Y,(x2.y2l  e  R\  ( X  ^  0)  A  (R  *  0).  (4) 
0  otherwise. 


be  more  convenient  for  our  image  algebra  to  use  only 
the  set  of  coordinates  of  pixels  that  have  value  1  to 
specify  an  image.  In  BIA,  an  image  is  then  treated  as  a 
set  of  coordinates  of  pixels  that  have  value  1.  This 
paper  deals  with  only  binary  arithmetic;  hence,  a  pixel 
represents  a  binary  bit  and  an  image  is  a  finite  2-D  bit 
plane.  We  list  here  only  those  basic  definitions  and 
operations  which  will  be  referred  to  later. 

Definition  of  Binary  Image  Algebra  {BIA) 

Binary  image  algebra  is  an  algebra  with  an  image 
space  S  and  a  family  F  of  five  elementary  images  and 
three  fundamental  operations.  Symbolically, 

BIA  =  |P(W0;®,u,-,/,A,A-1,B,B~'|.  (1) 

where  S  =  P(W)andF  =  (ffi,u,— ,I,A,A~l,B,B~l).  The 
image  space  S,  the  family  F,  and  all  other  symbols  are 
defined  in  the  following. 

(1)  The  Universal  Image  (the  bit  plane  containing 
all  bits  with  value  1):  The  universal  image  is  a  set  W  = 
((x,y)|x  e  Zn,ye  Zn\,  where  Zn  =  |0,±1,±2, . . .  ,±n|and 
n  is  a  positive  integer. 

(2)  Image  Space  (the  set  of  all  possible  bit  planes): 
The  image  space  is  the  power  set  (the  set  of  all  subsets) 
of  the  universal  image,  i.e.,  S  =  P(W). 

(3)  Image  (bit  plane):  A  set  X  is  an  image  if  and 
only  if  X  is  an  element  of  the  image  space  S,  i.e.,  X  is  a 
subimage  of  the  universal  image  W. 

(4)  Image  Point  (a  bit  with  value  1):  A  sampled 
point  (bit)  (x,y)  is  an  image  point  of  an  image  X  if  and 
only  if  (x,y)  is  an  element  of  the  set  X. 

(5)  Image  Transformation  (a  mapping  between  bit 
planes):  An  image  transformation  T  is  a  function 
mapping  the  image  space  6'  into  the  image  space  .S’. 

(6)  Three  Fundamental  Operations  (Fig.  1): 

(a)  Complement  of  an  image  X\ 


Remark:  e  denotes  belongs  to,  a  denotes  and,  v  de¬ 
notes  or,  and  0  is  the  null  image  having  no  image  point. 
Note  that  X  usually  represents  an  input  image  and  R  is 
a  reference  image  containing  predefined  information. 
We  can  define  other  image  operations  as  fundamental 
operations  instead  of  these  three  operations.  The  rea  - 
son  for  choosing  these  three  operations  is  because  of 
their  simplicity,  simple  software  design  and  simple 
hardware  implementation.  Dilation  can  be  interpret¬ 
ed  as  a  parallel  mathematical  formalism  of  the  pattern 
substitution  step  in  symbolic  substitution  (Sec.  5). 

(7)  Five  Elementary  Images:  There  are  five  ele¬ 
mentary  images: 

(a)  I  =  )  (0,0)| — consisting  of  an  image  point  at  the 
origin, 

(b)  A  =  |(1,' 0)| — consisting  of  an  image  point  right  of 
the  origin, 

(c)  A~l  =  ((—1,0)1 — consisting  of  an  image  point  left 
of  the  origin, 

(d)  B  =  |(0,1)| — consisting  of  an  image  point  above 
the  origin, 

(e)  B'1  =  ((0,— 1)( — consisting  of  an  image  point 
below  the  origin. 

In  fact,  these  five  elementary  images  could  be  reduced 
to  four  elementary  images,  because  I  =  A  ®  A~'  =  B  B 
B~].  Any  (reference)  image  can  be  represented  as 

X  =  U  A  71',  (SI 

<iji-  -V 

where  A'B1  =  A1  ffi  B1, 

A‘-A&A®...8A  =  Ki.OH  if  i  >  0, 

A'  A  1  ®A  1  ®  ...  ©  .4  1  =  Ki.OII  if  i  <  0. 
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/V  .4  ®  A'1  =  l. 

(8)  Reflected  Image:  Given  an  image  R,  its  reflect¬ 
ed  image  is  defined  as 

R  =  C  «I-  «i) 

(9)  Some  Standard  Derived  Operations: 

(a)  Difference  of  X  by  R  [Fig.  2(a)|: 

A '.Hi  =  | ( x,\- ) i ( x )  e  X  A  ( x ,v I  0  /?|  =  .V  n  R  =  .V  u  H.  (7) 
Remark:  X  =  W/X  where  W  is  the  universal  image. 

(b)  Intersection  of  two  images  X  and  R  [Fig.  2(h)): 

A  n  H  =  e  X  a  ( x ,v )  e  H\  =  A'  u  H .  (8) 

Remark:  Xu  R  =  X  n  R. 

(c)  Erosion  of  an  image  X  by  a  reference  image  R 
[Fig.  2(c)): 

X  ©  ft  =  A'  ffi  ft,  (.)) 

where  R  is  defined  above.  Remark:  X  ©  R  =  X  ©  R. 
The  erosion  of  an  image  X  by  a  reference  image  R  can 
be  thought  as  the  complement  of  the  dilation  of  the 
background  by  the  reflection  of  the  reference  image  R. 
In  general,  the  erosion  of  a  non-null  image  X  by  a  non¬ 
null  reference  image  R  decreases  the  size  of  regions, 
increases  the  size  of  holes,  eliminates  regions,  and 
breaks  bridges  in  X. 

(d)  Symmetric  difference  of  two  images  [Fig.  2(d)): 

X  Afi  =  (X/R)  u  <R/X)  =  Xu  fi  U  ft  u  X.  (10) 

Remark:  The  symmetric  difference  is  a  commutative 
operation,  and  is  its  own  inverse. 

(e)  Hit  or  miss  transform  ©  of  an  image  X  by  an 
image  pair  R  =  (R1M2)  [Fig-  2(e)): 

X®R  =  (Xefl,)  n  (Xefl2)  =  (X©  t?,fu  (X  ©  Sjj.  UD 

Remark:  The  hit  or  miss  transform  of  an  image  X  by  a 
reference  image  pairR  =  (Ri.Rs)  formally  describes  the 
pattern  recognition  step  in  symbolic  substitution  (Sec. 
5);  and  it  is  used  to  match  the  shape  (or  template) 
defined  by  1  he  reference  image  pair  R  where  R 1  defines 
the  foreground  of  the  shape  and  R2  defines  the  back¬ 
ground  of  the  shape.  The  conditions  are  that  the 
foreground  X  must  match  R(  (i.e.,  X  0  Rib  while 
simultaneously  the  background  X  matches  R->  (i.e., 
X  e  R2).  Note  the  similarity  of  the  symmetric  differ¬ 
ence  (parallel  bitwise  comparison)  and  the  hit  or  miss 
transform  (parallel  shape  or  symbol  recognition). 

The  important  results  of  BIA  are:  (1)  any  image 
transformation  can  be  implemented  by  the  three  fun¬ 
damental  operations  with  appropriate  reference  im¬ 
ages;  (2)  any  reference  image  can  be  generated  from  the 
elementary  images  by  using  the  three  fundamental 
operations;  and  (3)  BIA  provides  an  efficient  represen¬ 
tation  for  many  parallel  image  processing  algorithms 
(e.g.,  shape  and  size  verifications"’).  Here  we  demon¬ 
strate  that  BIA  is  also  a  fundamental  tool  for  parallel 
numerical  computation. 


Fig.  2.  Some  standard  derived  image  operations.  The  shaded 
regions  in  (l)-(d)  correspond  to  pixels  with  value  1:  (a)  difference; 
(b)  intersection;  (c)  erosion;  (d)  symmetric  difference;  (e)  hit  or  miss 
transform  (template  matching). 


2.2.  Review  of  DOCIP  Architecture 
We  have  designed  a  class  of  the  digital  optical  cellu¬ 
lar  image  processors  (DOCIPs)  for  effectively  imple¬ 
menting  BIA.’  *  15  Here  we  only  summarize  their  ma¬ 
jor  characteristics.  Details  are  given  in  Refs.  14  and 
15.  To  map  BIA  into  the  DOCIP  architecture  in  a 
transparent  way,  we  first  define  the  DOCIP  algebra¬ 
ically: 
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Definition  of  Cellular  Automata 

A  cellular  automaton  is  an  algebra  A  =  (.S’;/', A7,) 
where  5  is  the  state  space  which  is  a  set  of  states,  F  is  a 
family  of  transition  functions,  and  N,  is  the  neighbor¬ 
hood  configuration. 

Constraints  on  a  cellular  automaton  for  Implement¬ 
ing  BIA; 

(1)  .s'  i)  /’( W), 

(2)  F  3  |®,u,-|. 

(;n  ,vr  3  /  U  .-1  o  4"1  ultu  ZT  1  or  ZV,  '  .4  o  4  1  ■./  Z(  u  Zi  '. 

where  D  means  contains. 

Thus,  in  terms  of  cellular  automata,  the  DOCIPs 
have  to  satisfy  the  above  constraints  for  realizing  BIA. 
For  storing  input  images  and  temporary  results  in  a 
more  flexible  way,  the  DOCIPs  utilize  three  memory 
modules  and  all  share  the  same  algebraic  structure 
(except  the  neighborhood  configuration); 

DOC  IP  =  [/’( IV  X  K’X  IV);®.  ,-,/V,  |,  (12) 

where  X  denotes  cross  product  and  Nc  can  he  one  of  the 
following  four  types; 

( 1 )  DOCIP-array4:  each  cell  connects  with  its  four 
nearest  neighbors  and  itself,  i.e., 

N„^-l;AuA-'uH  113) 

(2)  D0CIP-array8:  each  cell  connects  with  its  eight 
nearest  neighbors  and  itself,  i.e., 

A’„„vh  =  U  A‘B>.  (14) 

IJ“-1 

(3)  D0CIP-hypercube4:  each  cell  connects  with 
those  cells  in  the  4  directions  at  distances  1,2,4, 8, . . .  ,2* 
from  itself,  i.e., 

/vhvpM<uw,  -  U  (41  off),  nsi 

.tJ 

where  k  is  sufficiently  large  for  the  connections  to 
traverse  the  entire  array  of  cells. 

(4)  D0CIP-hypercube8:  each  cell  connects  with 
those  cells  in  the  8  directions  at  distances  1,2,4, 8, . . .  ,2*' 
from  itself,  i.e., 

A’h,,*rrjh*»  =  (4‘  ...  Ii‘  v  A'H'  u  471  ').  (16) 

i  *P,  1 1 .  i  2.  -»  2* 

From  the  above  algebraic  description,  the  DOCIPs 
have  the  same  algebraic  structure  and  differ  only  in 
their  neighborhood  configurations  Nr.  Thus,  they 
share  the  same  architecture  shown  in  Fig.  3,  but  have 
different  configurations  of  the  reference  images  E,  de¬ 
pending  on  the  optical  interconnection  network  which 
defines  the  neighborhood.  In  practical  applications,  a 
larger  reference  image  H  can  be  generated  from  a  set  of 
smaller  reference  image(s)  E,  by  a  sequential  dilation. 
If  it  is  possible  to  decompose  R  into  a  sequence  R  =  E\ 
©  E,  ©  ...  ©  Eh,  then 

x  ®  R  - 1 . . .  |(.y  ©  /■:,)  ©  /•;,(©  . . .  ®  /•;,(.  <  iti 

'I’his  decomposition  may  not  exist,  in  which  case  R  can 
always  be  decomposed  as  R  =  R\  a  R  >  u  ...  y>  /f(i,  and 

then 


t  ~^>  1  >.<(,<  |N«N  i  ()<..t 


Kig.  3.  Digital  optical  cellular  image  processor  (DOCIP)  architec¬ 
ture — one  implementation  of  binary  image  Algebra  (BIA).  The 
DOCIP-array  requires  9  (or  5)  control  bits  for  reference  image 
The  DOCIP-hypercube  requires  O(logA')  control  hits  for  reference 
image  E 


A’  ©  ft  =  (,V  ©  It ,)  u  (A  ©  /f  .lw  . . .  u(.V  ©  Hk).  (IS) 

where  each  Rj  can  be  decomposed  into  smaller  refer¬ 
ence  images  £,.u  -;i 

Basically,  the  proposed  DOCIP  shown  in  Fig.  3  is  a 
cellular  SIMD  machine  and  consists  of  an  array  of  cells 
or  processing  elements  (PEs)  under  the  supervision  of 
a  control  unit.  The  control  unit  includes  a  clock,  a 
program  counter,  a  test  and  branch  module  for  feed¬ 
back  control,  and  an  instruction  decoder  for  storing 
instructions  and  decoding  them  to  supervise  cells. 
The  array  of  cells  includes  a  1  X  3  line  destination 
selector,  where  each  line  is  N1  bits  wide,  three  NxNX 
1  bit  memories  for  storing  images,  a  memory  selector, 
and  a  dilation  unit.  It  operates  as  follows:  (1)  a 
binary  image  (N  X  N  matrix)  is  input  into  the  destina¬ 
tion  selector  and  then  stored  in  any  memory  (or  set  of 
memories)  as  the  instruction  specifies;  (2)  after  one  to 
three  images  have  been  stored,  these  images  and  their 
complements  are  piped  into  the  next  stage,  which 
forms  the  union  of  any  combination  of  images  (speci¬ 
fied  by  the  instruction);  (3)  the  result  is  sent  to  a 
dilation  unit  where  the  reference  image  specified  by 
the  instruction  is  used  to  control  the  type  of  dilation; 
(4)  finally,  the  dilated  image  can  be  output,  tested  for 
program  control,  or  fed  back  to  step  (1)  as  the  instruc¬ 
tion  specifies. 

The  DOCIP  machine  (Fig.  3)  has  one  instruction;  it 
implements  the  three  fundamental  operations  of  BIA 
along  with  fetch  and  store.-1  This  design  uses  the 
parallelism  of  optics  to  simultaneously  execute  in¬ 
structions  involving  all  /V-  picture  elements.  Each 
instruction  takes  one  complete  cycle  to  execute.  Note 
that  the  DOCIP  machine  can  perform  a  dilation  by  any 
reference  image  R  that  is  a  subset  of  the  neighborhood 
configuration,  Nr.  in  a  single  clock  cycle. 

The  entire  system  can  be  realized  by  an  optical  gate 
array  with  optical  3-D  interconnections."  12  24  Figure 
4  describes  an  optical  implementation  concept  for  the 
DOCIP  architecture.  The  DOCIP  has  very  low  cell 
hardware  complexity  to  maximize  parallelism,  yet 
enough  cell  sophistication  to  permit  the  machine  to 
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Fig.  4.  DOCIP  physical  concept.  Each  processing  element  (PE)  or 
cell  connects  with  its  cellular  array  or  cellular  hypercube  neighbors 
and  itself  by  optical  3-D  interconnections.  The  optical  hologram 
provides  both  intracell  and  intercell  interconnections.  Intracell 
interconnections  and  imaging  optics  are  omtted  for  clarity.  The 
input  and  output  sides  of  the  optical  gate  array  are  interconnected 
bv  an  optical  feedback  path  and  are  shown  separately  for  clarity. 


execute  useful  programs.  The  use  of  optical  intercon¬ 
nections  permits  a  cellular  hypercube  topology  to  be 
implemented  without  paying  a  large  penalty  in  chip 
area  (the  cellular  hypercube  interconnections  are 
space  invariant  which  implies  relatively  low  hologram 
complexity);  it  also  enables  images  to  be  input  to  and 
output  from  the  machine  in  parallel. 

3.  Binary  Row-Coded  Arithmetic 

Binary  addition  of  two  A:-bit  numbers  yields  at  most 
k  +  1  bits,  and  binary  multiplication  of  two  k -bit 
numbers  yields  at  most  2k  bits.  In  this  paper,  we 
assume  that  all  input  numbers  are  padded  with  enough 
zeros  to  avoid  the  possibility  of  overflow.  This  also 
guarantees  that  the  different  operands  in  the  image 
will  be  treated  separately.  A  binary  row-coded  num¬ 
ber  is  encoded  in  a  part  of  a  row  of  an  image.  Although 
the  word  lengths  of  numbers  do  not  need  to  be  equal, 
we  assume  in  this  discussion  that  an  image  (bit  plane) 
with  N  X  N  bits  contains  N^/k  numbers  of  fc-bit  length 
as  a  simple  illustration  (Fig.  5).  In  this  section,  we 
describe  parallel  addition  and  multiplication  by  BIA 
expressions  and  their  programs  on  the  DOCIP  ma¬ 
chine.  Subtraction  is  discussed  in  Appendix  A. 

3.1.  Addition  of  Binary  Row-Coded  Numbers 

Consider  an  image  X  (e.g.,  Fig.  6(a)  J  composed  of  N"1/ 
k  numbers  x,,  i  =  1,2, .  . .  ,N~l/k,  an  image  R  (e.g.,  Fig. 
6(b) |  composed  of  N"2/k  numbers  r„  i  =  1,2, . . .  ,N~2/k, 
and  the  output  of  the  addition  S  =  X  +  R  (Fig.  6(c)|. 
To  realize  this  addition  in  parallel  by  means  of  BIA,  we 
first  consider  the  serial  (carry-propagate)  addition  of  2 
binary  numbers  s,  =  x,  +  r,.  The  first  step  of  serial 
addition  is  to  add  the  least  significant  bits,  say  x,<„\  and 
r, The  Boolean  logic  equations  for  adding  the  two 
least  significant  bits  (half-adder)  are 

sum  bit;  =  xl(„,  XOlt  rlhlt. 


N  t)i1 
lenqih 


N  tut 
length 


Fig.  5.  Binary  row  coded  numbers. 


MSB  LSB 


Fig.  6.  Parallel  addition  of  binary  row-coded  numbers  (I);  (a) 

image  X  of  operands;  (b)  image  R  of  other  operands;  (c)  output 
X  +  R. 


carry  bit:  c1(ol  =  AND  r1(0). 

Now,  applying  the  corresponding  parallel  operations 
of  XOR  and  AND,  i.e.,  the  symmetrical  difference  A  and 
intersection  n ,  and  shifting  the  set  of  carry  bits  by  a 
dilation  ©,  we  can  implement  parallel  addition  by  the 
following  recursive  equations: 

(1)  Define  the  initial  states  of  images  of  sum  bits 
and  carry  bits  (called  sum-bit  image  and  carry-bit 
image)  at  time  fu  as 

S«0)  =  X,  C((0)  =  R.  (19) 

(2)  The  recursive  relation  between  the  states  of  the 
sum -bit  image  and  carry-bit  image  at  two  adjacent 
time  intervals  is  then 

s((,*t)  =  su,)  a  at,)  =  sit,)  u "no  u  siaZai,).  <201 

('((„,)  =  |S(t,)  n  f<(,)|  ®  /T1  =  sit',)  u  cTt,)  ®  A-\  (21) 

where  i  =  0,1,2, ...  ,k  +  1,  and  the  elementary  image 
A"1  is  used  to  shift  the  carry-bit  image  one  bit  to  the 
left  for  the  next  iteration. 

(3)  After  a  maximum  of  k  +  1  iterations,  the  sum-bit 
image  is  the  result  and  the  carry-bit  image  is  the  null 
image  0: 

.S((,,,)  =  .V  +  It,  att,,)  =  r.  (22) 

This  procedure  is  illustrated  in  Fig.  7.  The  result  of 
parallel  addition  of  binary  numbers  with  a  maximum 
k-hit  word  size  is  obtained  after  k  +  1  iterations.  This 
algorithm  can  be  implemented  in  the  DOCIP  architec¬ 
ture  by  the  program  (instructions)  given  below.  Af|. 
M>,  and  A/:!  represent  the  three  N  X  N-bit  memories. 
X  »  M 1  denotes  store  X  into  memory  M\.  F,ach 
numbered  line  represents  a  single  DOCIP  machine 


* 
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instruction  for  one  value  of  i.  Comments  are  in  brack¬ 
ets. 

Assume  start  with  X  in  Mi[  =  7>((„)]  and  R  in 

M,[= ('«„)]. 

First  toAth  iterations: 

(1)  A/,  u  Af,  *  Af,|  =  S(l,)  u  C(f,)|, 

(2)  M,  u  M,  •  M,|=  SUJ  o  C(/,)|, 

Cl)  Af,  u  Af.,  u  Af,  *  Af2|=  S(t,)  u  C(/,)|, 

(4)  Af,  u  Af,  *  Af , |  =  $((,,,)|. 

(ft)  Af,  ®/T'  *  Af,|=  C(/,,,)|. 

where  i  =  0,1,2 . A  —  1. 

(A  +  l)th  iteration: 

(1)  At,  u  Af,  -  Af  ,|  =  S(/t>  u  (’((,)]. 

(2)  Af,  o  Af,  -  Af,[=  Sit*)  u  CUt)|, 

(3)  Af,  u  Af,  »  out[=  S (:,,,)  =  A’  +  li\. 

The  total  number  of  clock  cycles  for  the  execution  of 
this  program  on  the  DOCIP  machine  is  t  (A)  <  5 k  +  3  = 
0(k),  which  is  independent  of  the  number  of  words 
being  added. 

In  fact,  BIA  can  be  used  to  devise  a  parallel  form  of  a 
conditional-sum  adder  or  carry-lookahead  adder  for 
further  extracting  additional  parallelism,  and  the  exe¬ 
cution  time  of  this  addition  can  be  reduced  to  0(log>A). 
Obviously,  a  trade-off  exists  between  execution  time 
and  hardware  complexity.  This  paper  concentrates 
only  on  some  simple  algorithms. 
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Fig.  7.  Parallel  addition  of  binary  row-coded  numbers  (II):  The 
procedure  for  parallel  addition  X  +  R  where  X  and  R  are  shown  in 
Fig.  6,  S(tft)  =  S  =  X  +  R  and  CUs)  =  <t>. 


*.7  b*ts 


3.2.  Multiplication  of  Binary  Row-Coded  Numbers 

Using  the  representation  illustrated  in  Fig.  5,  we 
define  a  parallel  (matrix-constant)  multiplication  of 
an  image  set  of  binary  numbers  and  one  single  binary 
number  X  •  Rr,  and  parallel  (element-element)  multi¬ 
plication  of  two  image  sets  of  binary  numbers  X  X  R. 

I.  Matrix-Constant  Multiplication  X  •  R, 

Consider  an  image  X  [e.g.,  Fig.  8(a)]  comprising  AT2/ 
k  numbers  x„  i  =  1,2, . . .  ,AT2/A,  and  a  reference  image 
Rr  [e.g.,  F'ig.  8(bl]  comprising  only  one  single  A-bit 
binary  number  r  =  [r(*_i>r(*_2> . . .  r)0)]-j.  The  output  of 
the  parallel  multiplication  is  X  •  Rr  [Fig.  8(c)).  To 
realize  it,  we  first  consider  the  serial  multiplication  of 
two  binary  numbers  that  is  the  sum  of  the  shifted 
versions  of  the  multiplier  or  the  multiplicand.  Then, 
by  applying  the  corresponding  parallel  operations  and 
parallel  shifting  by  a  dilation  @,  we  can  implement  this 
parallel  multiplication  by  the  equation 

X  ■  R,  =  V  X  IS  A  (23) 

A - - 

I.Vrth  =  1 

where  the  sum  notation  £  refers  to  a  sequence  of 
parallel  additions  and  the  parallel  addition  +  is  de¬ 
fined  in  Subsec.  3.1. 

The  DOCIP  takes  O(A-)  clock  cycles  for  implement¬ 
ing  this  matrix-constant  multiplication.  Its  proce¬ 
dure  involves: 


COO’OI  ’] 

0001001)  .  .  . 

)101 10l|  a  a  . 
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<a<  ib>  (Ci 


Fig.  8.  Parallel  (matrix-constant)  multiplication  of  binary  row- 
coded  numbers:  (a)  image  X  of  operands;  (b)  image  Rr  containing 
only  a  single  number;  (c)  output  X  •  Rr. 


(1)  Generating  the  term  X  ffi  A~l: 

The  DOCIP-array  requires  at  most  l  <  A  —  1  =  0(A) 
clock  cycles,  because 
A~‘  =  (/»-')' 

e  A~'  ffi  .  .  .  0  A^>  (241 


X  9  A  1  =  |. . .  |(X  9  /r't  ©  A  '|  ®  .  .  ®  /T'|  • 

- - s - - 


The  DOCIP-hypercube  requires  at  most  logj/  < 
log,( A  -  1)  =  0(log2A)  clock  cycles,  because  we  can 
rewrite  l  as  a  binary  number  /  =  [a<p„K2;ji . . .  ama<oi]..>. 
and  we  have 


A-' 


A 


(2.V 
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,V  ®  A  1  =  (.  .  .  ||.V  ffl  .A  ®  A  '  1 

®  .  .  .  ©  .A'”'^*--'-1'"  *•'  ) 

where  [log2/J  is  the  greatest  integer  less  than  or  equal  to 
log;)/,  and  each  dilation  with  A~a>j>'2'  can  be  implement¬ 
ed  in  the  DOCIP-hypercube  in  one  single  clock  cycle. 

The  total  time  delay  for  generating  all  required  Ar  © 
A-1,  0  <  /  <  k  —  1,  is  bounded  by  0(k)  for  both  the 
DOCIP-array  and  the  DOCIP-hypercube.  Since 

X  ®  A-1  =  |X  ffi  A~l,_llj  ®  A-1,  (2(>) 

we  can  generate  the  new  term  X  ©  A~l  by  simply 
deriving  it  from  the  previous  term  X  ffi  A~u~x)  without 
starting  from  the  original  X.  The  total  generating 
time  is  then  dominated  by  the  number  of  terms  X  ffi 
.4“'  which  is  at  most  0(k). 

(2)  Implementing  the  summation 

V  X  ffi  A~l 

I.  Vr^=  1 

The  DOCIPs  require  at  most  k  —  1  =  0(k)  parallel 
additions  to  implement  this  summation,  and  each  par¬ 
allel  addition  requires  at  most  k  +  I  =  O(k)  iterations 
(as  shown  in  Subsec.  3.1).  Since  it  takes  O(k)  time  for 
generating  all  the  terms  X  ffi  A~l,  the  total  execution 
time  of  the  DOCIPs  for  this  matrix-constant  multipli¬ 
cation  of  A; -bit  binary  numbers  is  O(k)  X  O(k)  +  0{k)  = 
0(k2).  From  the  example  shown  in  Fig.  8,  Rr  =  I  u  A~ 2 
contains  only  a  single  number  r  =  (0101  )2  =  5,  and  the 
DOCIP  can  implement  this  matrix-constant  multipli¬ 
cation  X  -  Rr  as  follows: 

Assume  start  with  X  in  M\(=  X  ffi  /). 

(1)  M,  ®  A"2  -*  M2(=  X  ffl  A’2). 

(2)  The  instructions  of  the  parallel  addition  are  per¬ 
formed  as  shown  in  Subsec.  3.1: 


At,  +  M2  —  out(  =  X  -  ft,). 


II.  Element -Element  Multiplication  XX  R 

Consider  an  image  X  [e.g.,  Fig.  9(a)]  comprising  IV2/ 

k  numbers  x,,  i  —  1,2 . N^/k,  and  an  image  R  (e.g., 

Fig.  9(b))  comprising  N^/k  numbers  r„  i  =  1,2, . . .  ,/V2/ 
k.  The  output  of  the  element-element  parallel  multi¬ 
plication  is  X  X  R  [Fig.  9(c)].  Because  the  multiplica¬ 
tion  of  two  binary  numbers  is  the  sum  of  the  shifted 
versions  of  the  multiplier  or  the  multiplicand,  applying 
the  corresponding  parallel  operations,  we  can  imple¬ 
ment  this  parallel  multiplication  by  the  equation 
»- 1 

X  X  ft  =  V  (X  ®  A1)  n  |(ft  r>  (Af  ®  A ''ll  ®  u*.,!  1  A  '| 

=  V  X  ®  A'1 1>  ft  O  At  ®  A'1  ®  U*,',!'1  A"'.  (27) 

Ml 

where  the  mask  M  (Fig.  9(d)]  is  used  to  extract  the 
/th  bit  |where  the  0th  bit  is  least  significant  and  the 
ik  -  !)th  bit  is  most  significant).  The  DOCIPs  can 
implement  this  element-element  multiplication  by  the 
procedure 


Fig.  9.  Parallel  (element-element)  multiplication  of  binary  row- 
coded  numbers:  (a)  image  X  of  operands;  (b)  image  ft  of  other 
operands;  (c)  output  X  X  ft;  (d)  mask  M ;  (e)  image  u*~,j  A~'\ 

(f)  image  (ft  r,  M)  ®  A~i. 


(I)  Generate  X  ©  A ~l  and  R  u  M  ©  A~l- 
Using  an  argument  similar  to  that  in  Subsec.  I  above, 
the  DOCIP-array  takes  0(k)  time  and  the  DOCIP- 
hypercube  takes  O(log2&)  time. 


(2)  Generate  R  u  M  ©A^  ©  Uj=o  1  A~J- 


The  DOCIP-array  takes  O(k)  time,  because 

(i-r 


k-i-\ 

U  A~Js 

Jm  o 


=  (u  A*'^  ®  a  ^  ffl  . .  .  ®  A~,Sj  ■  (28) 


l  ^  0,  and  each  dilation  by  a  term  in  parentheses 
executes  in  one  clock  cycle. 

The  DOCIP-hypercube  takes  0( Iog2&)  time,  since 


U  A_'  = 


;-0 


UoK2(*-/-nJr 

ii 

n=0  *- 


U  A 


o 


(29) 


where  k  -  l  -  1  =  l«Lio*:2(*-/-i)i>  •  •  ■  au>a<(»]2.  and  again 
each  dilation  by  the  term  in  parentheses  executes  in 
one  clock  cycle. 

It  takes  0(k )  time  for  the  DOCIP-array  and  0(log2fc) 
for  the  DOCIP-hypercube  to  generate  the  term 


(X  $  A'1)  u  |[ft  u  (Af  ffl  A'') |  ffl  U*.',!'1  A-'). 

(3)  Implementing  the  summation 

Ar- 1  - r_ ._g=-__=_  - - ==  = 

V  X  ffl  A'1  u  ft  o  M  ffl  A"'  ffl  A~'. 

(-0 

The  summation  requires  at  most  (k  —  1)  addition 
operations,  and  each  addition  operation  takes  0{k ) 
time  on  the  DOCIP  system.  We  also  require  0(k)  time 
for  the  DOCIP-array  and  0( log2/e)  time  for  the  DO¬ 
CIP-hypercube  to  generate  each  operand  of  the  addi¬ 
tion.  Thus,  for  this  element-element  multiplication 
of &-bit  binary  numbers,  the  total  computation  time  is 
0(k:')  for  the  DOCIP-array  and  (){k2 log2A)  for  the  DO¬ 
CIP-hypercube. 

Multiplication  requires  more  than  three  memories. 
This  can  be  accommodated  by  either  building  more 
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memory  into  the  DOCIP  machine  or  bv  swapping  in¬ 
termediate  results  into  and  out  of  an  external  memory. 
In  the  latter  case  we  assume  the  external  memory  can 
be  loaded  and  unloaded  with  one  image  in  a  single  time 
step.  In  Sec.  4,  binary  stack-coded  arithmetic  also 
requires  more  than  three  memories;  we  make  the  same 
assumptions  on  the  use  of  an  external  memory. 

For  binary  column-coded  arithmetic,  a  number  is 
encoded  in  a  part  of  a  column  of  an  image  as  in  Fig.  10. 
All  the  algorithms  derived  in  this  section  can  also  be 
applied  to  binary  column-coded  numbers  except  that 
we  replace  the  elementary  image  A-1  by  a  different 
elementary  image  B  for  shifting  the  carry-bit  image  or 
borrow-bit  image  in  the  vertical  direction. 


4.  Binary  Stack-Coded  Arithmetic 

In  this  case,  a  number  is  encoded  in  a  stack  of  k 
image  planes  with  the  least  significant  bit  in  the  first 
plane,  next  least  significant  bit  in  the  second  plane,  etc. 
(Fig.  11).  We  assume  all  numbers  including  the  re¬ 
sults  of  arithmetic  operations  can  be  represented  in  k 
bits,  so  that  k  images,  each  with  NX  N  bits,  contain  N~ 
binary  numbers.  Here,  we  describe  parallel  addition 
and  multiplication  by  BIA  expressions.  Subtraction 
is  discussed  in  Appendix  B. 

4.1.  Addition  to  Binary  Stack-Coded  Numbers 

Using  the  representation  illustrated  in  Fig.  11,  we 
consider  the  parallel  addition  of  two  sequences  of  im¬ 
ages  of  binary  numbers.  Assume  a  sequence  of  images 
X  =  [X(*-1|,X(x-2), . .  -  ,X(0)]  [e-g.,  Fig.  12(a)J  storing 
Nz  binary  numbers  x„  i  =  1,2, . . .  ,1V2,  and  a  sequence 
of  images  R  =  [fl,*-n,fl(*-2), .  •  -  ,K<o>]  [e.g.,  Fig.  12(b)] 
storing  IV2  numbers  r„  i  =  1,2, . . .  ,1V2.  Then  the  out¬ 
put  of  the  parallel  addition  is  X  4-  R  =  S  = 
[S(*>,S(k-i), . . .  ,S(oi]  as  shown  in  Fig.  12(c).  To  realize 
this  addition  using  our  three  fundamental  operations, 
we  implement  an  array  of  full  adders  as  described  by 
the  equations 

(1)  The  least  significant  bit  planes  of  sum  bits  and 
carry  bits  are  given  by 

— ' ( i ) i  =  -V,,,,  A  n,()t  —  tj  Hitit  *-*  X„„  kj  (.10) 

Cm  =  X,m  o  w  /I,,,,.  (.’ll) 

(2)  The  recursive  relations: 

■S',.,  =  ,V„,  A  K,„  A 


— _  N-b.t 

^  length 


Fig.  10.  Binary  column-coded  numbers. 


Fig.  11.  Binary  stack-coded  numbers.  x,(m)  represents  the  mth 
bit  of  the  ith  number  in  the  image  plane.  X(oj  represents  the  image 
plane  of  least  significant  bits  and  X(*-n  represents  the  image  plane 
of  most  significant  bits. 

This  algorithm  can  be  implemented  in  the  DOCIP 
architecture  by  the  program  (DOCIP  instructions): 

Assume  start  with  X(0)  stored  in  Afi  and  Rm  stored 
in  A/?. 

Calculate  S(0)  and  C(p: 

(1)  Af,  u  A?2  -  M,  &  out[=  C,V,1. 

(2)  M,  u  M2  -*■  M][=  X(0I  u  R(0t|, 

(3)  Af,  u  M2  ij  Af3  —  W2l=  X((1,  u 

(4)  A7,  u  M2  -»  out[=  S(0)]. 

Calculate  S(n  and  C(2>: 

(1)  X(n-M„ 

(2)  A#,  u  Af,  —  M2|=  X,„  o  C(,,|. 

(3)  Af,  u  Af3  -»  Af,[=  Xm  u  <?,„], 


l*„ 

,  -  -  I 

K««» 0  rJ  u 

l-v„, 

•'  I*.,. 

-  -  f,„] 

l*„ 

1 ,J  K.,i  <-1  w  ( 

u 

^<1 )  KJ  (  (ill 

i*... 

u  ft„,  u  (  „,| 

(A  „, 

u  0  CJ 

1  - 

|X„, 

1  r’  W„,l  u  IX,,,  " 

C„,l  U 

K.  - 

= 

[X„ 

,  0  1  U  1X,„  u 

C,„l 

j  |W<(,  \j  C  (,j| 

where 

,  = 

0,1,2, 

1. 

<3)  1 

The 

•  final  solution 

is 

X  +  R  =  .S  = 

l-V.* 

(fc  i)  -  ••Sml 

(341 

where  .S’,*,  =  Cth)  because  X,*,  =  R,kt  =  0. 


(32) 


(33) 
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(4)  -  A/,|  =  .V,|,  A 

(.r.)  It,,,  -Af,. 

(6)  M,  L  Af,  -  Af„, 

(7)  A/,  w  Af.,  -  Af,. 

(8)  Af,  u  Af  |  -  out|  —  .S„,|, 

(9)  A',,,  *Af„ 

(10)  K„,  •  Af., 

(11)  Af,  u  Af,  -Af„ 

(12)  CIM  *  Af,, 

(18)  Af,  u  Af,  *  Af>, 

(14)  Af,  u  Af,  *  Af  „ 

(15)  A, , ,  -»  Af,, 

(16)  Af,  u  Af,  -  Af,. 

(17)  Af,  ij  Af *  Af ,  &  out)  =  C,,,j. 

Calculate  S,-2)  to  Sot-n  and  Cc»>  to  C,*): 

Use  the  same  instructions  for  calculating  Sjn  and 
C< 2)  except  that  X(n  and  fi<i)  [and  S(i>  and  C(2)]  are 
replaced  by  Xu)  and  /?(,»  [and  S<o  and  Cq+i)]  in  each 
iteration,  and  in  the  beginning  of  an  iteration  the  mem¬ 
ory  Af.i  stores  C<, ,  instead  of  Cm,  i  =  2,3, . . .  ,k. 

The  complete  execution  of  this  operation  in  the  DO- 
CIP  requires  t(k)  <  17(k  —  1)  +  4  =  ilk  —  13  =  0(k) 
clock  cycles.  Additional  parallelism  could  be  extract¬ 
ed  to  further  reduce  the  execution  time  by  utilizing 
carry-lookahead  techniques  or  by  optimizing  the 
above  program. 

4.2.  Multiplication  of  Binary  Stack-Coded  Numbers 

Let  the  result  of  the  parallel  multiplication  b  e  XX  R 
=  M  -  [M(2*-n.M<2*-2). . .  •  ,M(0)]  [e.g.,  Fig.  12(e)]. 
Since  binary  multiplication  is  equivalent  to  the  addi¬ 
tion  of  shifted  versions  of  the  multiplicand,  applying 
the  corresponding  parallel  operations,  we  can  imple¬ 
ment  the  parallel  multiplication  by  the  equations 


Fig.  12.  Parallel  arithmetic  with  binary  stack-coded  numbers: 
(a)  sequence  of  images  X  =  1X(;„,X(2,,X(|i,X«i,);  (b)  sequence  of 
images  R  =  [  li,u,  R,  21.  Rnt  ,/?ioi  1 ;  ( c )  sum  X  +  R  = 
|Si4i,S,;,,,S<,|,.‘>,u„S’(oi|;  (d)  difference  D  ~  X  -  R  - 
1, f), ,,./)„ n|:  (e)  product  Af  =  X  X  R  =  (Af,7>,Af  . .  ,Af„„|. 

Recently,  the  use  of  symbolic  substitution  as  a  basis  for 
digital  optical  computing  has  been  reported  in  Refs. 
17-19  and  25-32.  Special  symbolic  substitution  rules 
can  be  applied  to  perform  arithmetic  operations  and 
simulate  a  Turing  machine.19  Symbolic  substitution 
demonstrates  the  ability  to  solve  any  computable 
problem  and  performs  many  operations.  Here  we  for¬ 
malize  symbolic  substitution  by  BIA  algebraic  sym¬ 
bols,  demonstrate  that  symbolic  substitution  rules  are 
particular  BIA  image  transformations,  and  give  the 
BIA  formal  notations  of  binary  symbol-coded  (sym¬ 
bolic  substitution)  arithmetic. 

5.1.  BIA  Representation  of  Symbolic  Substitution 
In  this  subsection  we  give  the  BIA  equation  for  sym¬ 
bolic  substitution  and  show  how  it  can  be  implemented 
on  the  DOCIP  machine.  A  symbolic  substitution  rule 
involves  two  steps:  (1)  recognizing  the  locations  of  a 
certain  search-pattern  within  the  2-D  binary  input 
data,  and  (2)  substituting  a  replacement-pattern  wher- 


{*'"  = 

9’^ . O^ifc  *- 

2)  0  •  • 

•  °  1  * 

(25 

/"’  = 

A 

0.0 . 0,X,*_ #,1)tX|,.| 

,,  r,  K„„  .  .  . 

.  ,X„„  n  K,„.0.0,  .  . 

. .  ,0. 

■  (26 

X  x  K  = 

fc-1 

k  1 

Af  =  V  r“’  -  +  r“  +  . . 

.  +  r“ 

(27 

where  1  =  0,1, ...  ,k  -  1,  and  the  addition  +  is  defined 
in  Subsec.  4.1.  Since  this  parallel  multiplication  re¬ 
quires  at  most  k  —  1  additions,  each  addition  takes 
(){k)  time  for  the  DOCIP,  and  each  /*'•  can  be  generat¬ 
ed  in  0(k)  time,  the  total  execution  time  is  0{k') 

5.  Symbolic  Substitution  and  Binary  Symbol-Coded 
Arithmetic 

Symbolic  substitution  was  first  considered  as  a 
means  of  utilizing  the  parallelism  of  optics  by  Huang.1  ‘ 


ever  the  search-pattern  is  recognized.  We  derive  it  by 
BIA  in  the  following  steps  (illustrated  in  Fig.  13): 

1.  BIA  Notations  for  Symbolic  Substitution 

2-D  binary  input  data  =  image  (bit  plane)  X. 
Symbol  to  be  recognized  (search-pattern)  =  refer¬ 
ence  image  (or  image  pairs)  R. 

Symbol  to  be  replaced  (replacement-pattern)  =  ref¬ 
erence  image  Q. 
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S«arcn  pattern 

R  = 


Fo»®you«d  -  8acfcgfOund  - 

Pattern  o i  i  «  Pattern  or  O'* 


foreground  r«COgnibOO 


Replacement  pattern 

Q 


X@R 


Subsiituinyi  output 


(A :®R)<bq 
(x&K)  n.w 


Symbol  substitution  rule  -  na  or  mas  iranjkxm  .  OtUuoo 


Kip.  i,'l  BIA  representation  of  symbolic  substitution.  The  option¬ 
al  mask  M  is  for  controlling  the  block  seach  region. 


2.  Symbolic  Substitution  Rule 

Step  1,  recognition  of  the  search-pattern: 

(a)  Foreground  recognizer:  the  locations  of  a  cer¬ 
tain  spatial  search-pattern  R,  (defined  by  its  fore¬ 
ground)  within  the  foreground  of  the  2-D  input  data  X 
can  be  recognized  by  the  erosion  operation  of  X  and  R\ : 

Xs«,  =  X®/? (.18) 

(b)  Background  recognizer:  the  locations  of  a  cer¬ 
tain  spatial  search-pattern  R 2  within  the  background 
of  the  2-D  input  data  X  can  be  recognized  by  the 
erosion  of  X  and  Ry. 

X  e  /?,  =  X  ®  R2.  (39i 

(c)  Full  recognizer:  by  combining  the  two  above 
steps,  the  locations  of  a  certain  spatial  search-pattern 
R  =  (R,,R2)  (R 1  defines  the  foreground,  and  R2  defines 
the  background)  within  the  2-D  input  data  X  can  be 
recognized  by  the  hit  or  miss  transform  of  X  and  R: 

X  © R  =  (Xe«,)  n  (Xeftj  =  (X  ®  /?,>  o  (X  ®  fi:).  (40i 

Step  2,  substitution  of  the  replacement-pattern: 

Substituter:  a  new  replacement-pattern  Q  can  be 
substituted  for  R  wherever  the  search-pattern  R  is 
recognized  by  the  dilation  of  X  ©  R  by  Q. 

Synthesis: 

A  complete  symbolic  substitution  rule  is  implement  - 
ed  by  the  hit  or  miss  transform  of  X  by  R  followed  by 
the  dilation  by  Q: 

(X<V)ft>  ®  Q  =  i(X  Oft,)  o  (X  o  K ,)|  ©  () 

=  (X®ft,l-.(X®  (41) 


Kig.  14.  Symtxrlir  substitution  system  with/)  symbolic  substitution 
rules. 


3.  Symbolic  Substitution  System  (Fig.  14) 

To  work  with  more  than  one  rule  (say  p  substitution 
rules)  for  practical  applications,  a  symbolic  substitu¬ 
tion  processor  produces  several  copies  of  the  input  X, 
provides  p  different  recognizer-substituter  units,  and 
then  combines  the  outputs  of  various  units  to  form  a 
new  output.  Thus,  a  symbolic  substitution  system  is 
implemented  by 

U  IX©/?"1)  ®Q«".  (43) 

1=1 

where  RM  and  Qu),i  =  1,2, . . .  ,p,  are  the  reference 
image  pairs  and  replacement  patterns  in  the  ith  sym¬ 
bolic  substitution  rule.  This,  then,  is  the  BIA  formula 
for  general  symbolic  substitution. 

Hence,  a  general  mathematical  formalism  of  sym¬ 
bolic  substitution  has  been  developed.  For  a  local 
search-pattern  and  replacement-pattern  (i.e.,  R\,Ro,Q 
c  Xarr8y  or  Nhypercube).  the  DOCIP-array  or  DOCIP- 
hypercube  can  implement  a  symbolic  substitution  rule 
in  four  (or  five  with  the  optional  mask)  steps: 

Assume  start  with  X  in  M\. 

(1)  Af,  ®  /),  —  Af>, 

(2)  M,  ®/L  -M:„ 

(3)  Af,  u  At,  -  Af„ 

(4)  Af,  ®  (^  —  out|=  (X  ©  ft)  ®  y|. 

Let  the  pixels  used  in  the  substitution  rule(s)  of  a 
symbolic  substitution  processor  be  the  neighborhood. 
Nss  of  the  processor.  We  see  from  the  above  steps  that 
the  DOC1P  can  simulate  the  symbolic  substitution 
processor  in  constant  time  if  the  two  machines  have  the 
same  neighborhood.  If  Nsx  is  not  a  subset  of  the  DO- 
CIP  neighborhood,  the  simulation  will  take  longer.  In 
either  case,  it  is  not  presently  known  how  many  steps  it 
takes  the  symbolic  substitution  processor  to  simulate 
the  DOCIP. 


Optional  masking: 

An  optional  mask  M  can  be  used  for  controlling  the 
block  search  region.  A  symbolic  substitution  rule  can 
be  modified  as 

\(X(’)R)  r,  Af|  ffi  Q  (421 

By  proper  choice  of  M,  the  search  can  be  made  in 
overlapping,  disjoint  or  noncontiguous  blocks. 


5.2.  Binary  Symbol-Coded  (Symbolic  Substitution) 
Arithmetic 

A  bit  in  a  binary  number  is  encoded  symbolically  as 
pixels  of  an  image  (Fig.  15).  In  this  subsection,  we 
primarily  concentrate  on  single-pixel  coding:  a  logic 
value  (0  or  1)  is  represented  by  a  single  pixel  (dark  or 
bright)  | Fig.  15(a)),  as  in  the  binary  row  and  stack- 
coded  number  representations,  but  the  operands  of 
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Zero 


*  O't 


□ 

(*» 


Zero  Ok 


One 


(b)  (c) 

Fig.  I  f>.  Hit  eneixled  as  a  symbol:  (a)  single-pixel  coding  of  zero 
and  one  (a  bit  is  a  pixel);  (b)  two-pixel  coding  of  zero  and  one  (a  bit  is 
encoded  as  two  pixels)  (adapted  from  Refs.  18  and  19);  (c)  six-pixel 
coding  of  zero  and  one  (a  bit  with  value  zero  or  one  is  encoded  as  six 
pixels)  (adapted  from  Ref.  ,‘H). 


binary  numbers  jc,  and  r,  are  stored  in  the  same  input 
image  X  as  shown  in  Fig.  16(a).  The  expected  output 
images  of  symbolic  substitution  for  binary  addition 
and  binary  subtraction  are  shown  in  Figs.  16(b)  and 
(c).  To  achieve  these  desired  operations,  the  symbols 
associated  with  the  operands  are  recognized  and  then 
replaced  by  new  symbols  associated  with  the  results  of 
the  operation.  Systems  for  implementing  binary  ad¬ 
dition  and  subtraction  are  formalized  and  illustrated 
as  examples  of  binary  symbol-coded  arithmetic  below. 

5.2.1.  Addition  of  Binary  Symbol-Coded  Numbers 
This  parallel  binary  addition  (Fig.  17)  can  be  imple¬ 
mented  with  four  symbolic  substitution  rules  [Fig. 
17(a)].1718  In  the  case  of  single-pixel  coding,  as  we  will 
show.  Rule  1  is  not  necessary.  The  symbolic  substitu¬ 
tion  system  for  single-pixel  coding  can  be  realized  as 

>'((„)  =  X.  (44 1 


<b>  (c) 


Fig.  16.  Binary  symbol-coding  (symbolic  substitution!  binary 
arithmetic):  (a)  input  image  X  contains  the  operands  x,  and  r,; 


(b)  output  of  parallel!  addition;  (c)  output  of  parallel  subtraction. 


V(t,„)  =  U  l|V((,)©Kl‘l|  nM\9Q"\  (45) 

.-I 

where  Y(tku)  is  the  result,;  =  0,1,2, 1  ,k  is  word 
size  (i.e..  the  number  of  bits  in  an  operand);  R"1  = 
']  and  Qul  are  shown  in  Fig.  17(b)  and  repre¬ 
sented  as 


111 

"  0. 

1  =  u: 

V 

(21 

K? 

=  I. 

H'P 

=  /i. 

</-'  = 

/, 

Cl) 

H\" 

=  /(, 

KV" 

=  i. 

(/"  = 

/. 

(4) 

R\" 

=  U,' 

I{\" 

=  0, 

w 

Here  the  null  image  0  and  the  elementary  images  are 
as  defined  in  Subsec.  2.1;  the  mask  Af  |Fig.  17(c)],  used 
for  controlling  the  block  search  region,  is  the  image 
corresponding  to  the  coordinates  of  the  origins  (lower- 
left  pixels)  of  the  input  symbols  in  the  input  image  X. 
An  example  is  given  in  Fig.  17(d).  Note  that  Q' 11  =  0 
implies 

l|V(r, »(•>«'")  n  Ml  ®  </"  =  0.  (  Mil 

so  that 


(4) 


Fig  17.  Parallel  addition  of  binary  symbol  coded  numbers 
la)  four  symbolic  substitution  rules  for  addition;  (hi  reference  image 
pairs  K"'  and  reference  images  (/'  \(  =  l,"2„'l,4.  used  for  add il  ion;  ff 1 1 
is  a  null  image.  Rule  I  is  not  needed  for  this  single  pixel  coding. 
<c)  mask  M.  (d)  example  of  parallel  addition  of  binary  symbol  coded 
numbers. 
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Fig.  19.  Symbolic  substitution  binary  addition  with  encoding  a  bit 
as  si*  pixels  (adapted  front  Kef.  11 1. 


Fig.  18.  Symltolic  substitution  binary  addition  with  two-pixel  cod¬ 
ing:  <al  reference  image  pairs  li'“  and  reference  images  Q'“,i  = 
1 .2.3.4.  used  for  addition  (with  two  pixel  coding)  (adapted  from 
Kefs.  18  and  19);  (hi  mask  Af. 


Y< /,,,)  =  U,4.,  1 1  »'«,)(£>  AT '|  '  Aft  ®  Q'“ 

=  U?.;t|  V(fy»  ®  «•'*!  -a  Afl  ®  Q'" 

=  u:.,.iiv(f)i@A;,,i  u  [v(f,)©/?v’i  n  A/i  ®<r. 

(47) 

Thus,  for  single-pixel  coding  of  symbolic  substitution, 
we  can  reduce  the  four  rules  of  binary  addition  to  only 
three  rules.  However,  this  reduction  of  complexity 
cannot  be  applied  to  two-pixel  (i.e.,  dual-rail)  or  six- 
pixel  coding. 

When  implemented  on  the  DOCIP,  this  addition 
requires  at  most  k  +  1  iterations,  each  iteration  requir¬ 
ing  two  union  operations  of  three  results  of  symbolic 
substitution  rules,  and  each  rule  is  realized  within  five 
steps  as  shown  in  Subsec.  5.1.  Thus,  the  total  execu¬ 
tion  time  in  the  DOCIP  is 

((*)  <  Cl  x  5  +  2)(*  +  1)  =  17 (A  +  1)  =  (){k). 

When  using  two  or  six  pixels  to  represent  a  logic  value 
(Figs.  15(b)  and  (c)|,we  can  formalize  symbolic  substi¬ 
tution  addition  as  follows. 

With  two-pixel  coding  [Fig.  15(b)), '''-"we  can  imple¬ 
ment  a  full  recognition  with  only  a  background  recog¬ 
nizer  (or  foreground  recognizer): 

Yu,,,)  =  > I| vu,i© K'"l '  Ml  ®  if" 

=  I  II  >’(!,)  (•)/?;"!  ,,  I  >•(/,»(♦) /fy’l  Af|  ©  </" 

=  u'.,  |YU,)©RV'  o  A/ 1  ©  </".  (48) 

where  j  =  0,1,2 . k\  RUI  =  (/f'/’./fV'l  and  Qu>  are 

shown  in  Fig.  18(a)  and  represented  by  elementary 
images  as 

(1)  K\"  =  /  fl  .  fl',"  =  II  IV.  if"  =  /•,/!  'fl'. 

(2)  fl',’1  =  '  j;  i  if.  =  l  IV.  Q  '1  =/(,i/i 


CD  i "  =  /  o  fl’,  K?/'  =  Ur_i  f 

(4)  Rlltl  =  flu8:l,  ft(/'  =  /oK-,  <V4  =  /  ^  A-'/i'; 

and  the  mask  A/  is  shown  in  Fig.  18(b).  Since 

|V((,)  ©  /?',"|  u  |  VO,)  ©  f/V'l  =  (Ft/,)  ©  /{',"( 

=  [>'((,)  ©  /?V'|,  (49) 

for  the  two-pixel  coding,  /?'"  can  be  represented  by 
only  its  foreground  ft*,'1  or  background  ft!/1.  For  imple¬ 
mentation  on  the  DOCIP,  this  algorithm  requires  four 
rule-,  and  each  rule  involves  two  dilations  and  one 
union  or  intersection.  Because  they  may  be  not  in¬ 
cluded  in  N array  or  iVhypercui*,  each  dilation  of  R{2  or  Qu) 
is  implemented  by  2-4  steps  for  the  DOCIP-array8  and 
1-2  steps  for  the  DOCIP-hypercube8.  The  total  exe¬ 
cution  time  is  bounded  by  28(A:  +  1)  for  the  DOCIP- 
array8  and  18(fe  +  1)  for  the  DOCIP-hypercube8. 
Moreover,  it  requires  more  difficult  two-pixel  coding 
and  doubles  the  device  area. 

With  six-pixel  coding  [Fig.  15(c)], :!l  the  mask  M  is 
not  needed  and 

YU,,,  =  U  |YU, )  ©«"']©  Q"\  (50) 

where  j  =  0,1,2, . . .  ,k,k  is  the  word  size;  Ru>  =  [/f1/*,/?,1] 
and  Q(n  are  shown  in  Fig.  19  and  are  represented  as 

(1)  fl1,"  =  /  u  Afl  u  fl-  u  AH'. 

fl1,"  =  fl  u  «:'  u  A  u  AR:  u  (Uf.,,  A'-fl'I. 

Q'"  =  A-'R-  o  A  -B'  u  I  u  AH. 

(2)  =  (U?.,  «'»  o  A  u  A', 

Rl/'  =  /  u  IV  u  (Uf.,  AH')  (U,\„  A  H'), 

</-’  =  A  'If  u  A'lV  w  R  W  A, 

(3)  fl1,'"  =  I  u  IV  v  (Uf-.,  AW). 

fl'2:"  =  (vj©  «*»  uAuA'v  ('j;.,,  A’H'I. 
if"  =  A  'R1  u  A  -H\.  R  <  A. 

(4)  R\"  =  fl  u  H:l  u  A  u  Afl-. 

=  /  u  fl-  u  A/i  u  Afl1 ..  (U;‘.„  A-’fl'l, 
if"  =  A  'lV  w  A  'ft-  u  /  u  Afl 

'Fhe  six-pixel  coding  removes  the  need  for  the  mask  M, 
but  requires  more  difficult  encoding,  more  difficult 
implementation  of  the  hit  or  miss  transform  by  Ru>  and 
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dilation  by  Q11',  and  six  times  the  hardware  area.  Ad¬ 
dition  on  the  DOCIP-array  or  DOCIP-hvpercube  us¬ 
ing  six-pixel  coding  takes  much  more  time  (a  factor  of 
more  than  10  times)  than  the  time  required  for  single¬ 
pixel  coding  or  two-pixel  coding. 

6.  Complexity  of  Parallel  Optical  Binary  Arithmetic 

We  have  shown  that  BIA  offers  a  general  tool  for 
mapping  serial  binary  arithmetic  into  different  forms 
of  parallel  binary  arithmetic  (including  binary  row¬ 
coding,  binary  stack-coding,  and  three  coding  tech¬ 
niques  for  symbolic  substitution  arithmetic)  in  a  pre¬ 
cise  and  compact  way  The  complexity  of  parallel 
addition  and  subtraction  of  two  N  XN  arrays  of  binary 
numbers  (each  number  with  A’-bit  length)  for  these 
different  number  representations  are  compared  in  Ta¬ 
bles  I  and  II.  Binary  row-coded  arithmetic  requires 
the  smallest  number  0  of  fundamental  operations. 
Binary  stack-coded  arithmetic  requires  the  lowest 
number  of  processing  elements  (or  cells)  P  and  the 


Table  I.  Complexity  of  Parallel  Optical  Binary  Addition  of  Two  N  X  N 
Arrays  of  fc-BIt  Binary  Numbers;  Each  Parallel  Fundamental  Operation 
Corresponds  to  P  Processing  Elements  Executing  in  Parallel 


Number 

Representation 

Bnary 

Row  coding 

Binary 

Stack  coding 

Symbolic 
Cubs*  tut  ion 
(single  pixel 
coding) 

Symbolic 
Substitution 
(two  pixel 

coding) 

No  ot  Dilations 
(or  Erosions) 

k 

0 

9(k.1) 

8(k.1) 

No  of  Unions 
(or  Intersections) 

4k.  3 

16k  12 

I5(k.1) 

16(k.1) 

No  of 

Complements 

7k  .4 
•2k. 2 

t?(k*1) 

*3<k.1) 

TotaINo  of  Parallel 
Fundamental  O 
Operations 

12k.7 

*7k.5 

36k  25 
•23k  17 

36(k.1) 

*27(k.1) 

40<k.1) 

*28(k.1) 

No  ot 

Processing  p 

Elements 

kN> 

N7 

2kN* 

4k  N7 

Total  No  of 
Computations 

(l2k.7)kbK 

•<7k.5)kN> 

(36k  25}N> 
*(22k  161^ 

76k(k«1  )N7 
•SAkfk.IJN7 

l60k(k.l)N7 

•IIJktk.IJN7 

DOCIP  T 

Execution  Time 

5k. 3 

17k- 13 

I7(k.1) 

18(k.t)  or 
2B(k.1) 

PxT 

’  indicates  the  number  of  operations  when  erosion  and  intersection  are  also  allowed 


Table  II.  Complexity  of  Parallel  Optical  Binary  Subtraction  of  Two  N  X  N 
Arrays  of  k- Bit  Binary  Numbers 


Number 

Representation 

Binary 

Row  coding 

Binary 

Stack  coding 

Symbolic 
Substitution 
(single  pixel 
codmg) 

Symbolic 
Substitution 
(two  pixel 
coding) 

No  of  Dilations 
'Cf  E  fOSiOnsI 

k 

0 

6(k.1) 

8(k  ♦  1 ) 

NO  0*  Unions 

(O'  Intersections) 

4fc.3 

1 6fc  1? 

10(k.1| 

I6(k.i) 

No  Of 

Complements 

6k  .4 
•3k.? 

??k  18 
*  1  t  k  8 

8<k.l) 

•?<K.1» 

16(k.1) 

*4(k« I ) 

lua 

Ilk. 7 
•«k.6 

41k  33 
•33k  ?6 

?4|k.1) 

*  1  8(k»  1 ) 

40(k.  1 ) 
*?8<k.1) 

No  of 

Processing  f» 

f  lements 

k  N; 

N7 

7kNJ 

4k  N7 

Toiai  No  of 
Computations 

(1  Ik./HN' 
*!8k.5)kN' 

(43k  33)N7 
’(Ilk  ?6)N' 

4flk(k«  1)N7 
•36k(k.1)N> 

IGOkfk.DN7 
‘1 1Pk(k.1)N7 

4k.  t 

?0k  17 

I1(k.1i 

1 8(k  ♦  1 )  Or 
?8(k.f) 

PpT 

(4k  .  3j*  N7 

f?0fc  171N7 

??klk»1  IN' 

*  ind«ra*es  me  n  umber  o'  operations  when  e>os>on  and  intersection  are  also  allowed 


smallest  overall  O  X  P  complexity  (assume  each  paral¬ 
lel  fundamental  operation  corresponds  to  P  processing 
elements  executing  in  parallel).  For  the  normal  case 
in  which  the  word  size  is  larger  than  one  and  much 
smaller  than  the  image  size  (1  <  k  «  TV),  binary  row- 
coded  arithmetic  can  be  implemented  in  the  DOCIP 
with  the  fastest  computation  speed  (assume  the  DO¬ 
CIP  can  input  all  operands  in  an  image  at  a  time).  For 
implementation  on  the  DOCIPs,  the  complexity  of 
binary  symbol-coded  (symbolic  substitution)  arithme¬ 
tic  is  in  all  cases  higher  than  that  of  binary  row-coded 
and  binary  stack-coded  arithmetic.  For  implement¬ 
ing  symbolic  substitution  algorithms  on  the  DOCIPs, 
the  single-pixel  coding  is  superior  to  the  other  symbol 
coding  techniques. 

7.  Conclusion 

Optical  computers  can  operate  on  2-D  planes  of  data 
in  parallel.  Boolean  logic  equations  do  not  provide  a 
complete  description  of  such  parallel  operations  for 
binary  arithmetic.  An  optical  system  that  operates  on 
planes  of  data  should  employ  an  inherently  parallel 
mathematical  description  for  its  arithmetic.  In  this 
paper  we  use  binary  image  algebra  (BIA)  to  develop 
parallel  numerical  computation  algorithms,  and  to  de¬ 
scribe  the  execution  of  these  algorithms  on  a  digital 
optical  cellular  image  processor  (DOCIP)  architecture. 

BIA  is  demonstrated  to  be  a  general  technique  for 
developing  and  formulating  parallel  numerical  and 
non-numerical  computation  algorithms  for  digital  op¬ 
tical  computers.  The  DOCIP  is  a  simple  optical  archi¬ 
tecture  for  effectively  implementing  BIA.  Symbolic 
substitution  is  a  subset  of  BIA  and  can  be  formalized  in 
compact  BIA  expressions.  Three  different  techniques 
for  parallel  optical  binary  arithmetic,  based  on  binary 
row-coding,  binary  stack-coding,  and  binary  symbol¬ 
coding  (symbolic  substitution),  are  illustrated  for  im¬ 
plementation  on  the  DOCIP.  Binary  row-coding 
arithmetic  has  fast  DOCIP  execution  and  binary 
stack-coding  arithmetic  requires  the  lowest  number  of 
computations  0  X  P.  In  summary,  BIA  and  the  DO¬ 
CIP  represent  a  simple  yet  powerful  parallel  digital 
optical  algorithmic  and  architectural  technique  for 
both  numerical  and  non-numerical  applications. 
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Appendix  A:  Subtraction  of  Binary  Row-Coded  Numbers 

Let  the  output  of  the  parallel  subtraction  be  D  =  X  — 
R  |e.g.,  Figs.  20(a)— (c) ).  To  realize  it,  we  first  consider 
the  serial  binary  subtraction  of  2  binary  numbers  d,  = 
x,  -  r,.  The  procedure  in  the  least  significant  bits  x,(„> 
and  r„,,|  of  binary  subtraction  generates  a  difference 
bit  d,(„ i  and  a  borrow  bit  The  Boolean  logic 
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equations  for  subtracting  the  two  least  significant  bits 
(half-subtractor)  are 

difference  hit:  sl4„,  =  x,,.,,  \OK  rH(l|, 

borrow  bit:  c*llw,  -  x1<fl)  AND  rJ(<l). 

Now,  applying  the  corresponding  parallel  operations 
and  shifting  the  set  of  borrow  bits  by  a  dilation  ©,  we 
can  implement  the  parallel  subtraction  as  follows: 

(1)  Define  the  initial  states  of  images  of  difference 
bits  and  borrow  bits  (called  difference-bit  image  and 
borrow-bit  image)  at  time  t„  as 

/}((„)  =  X.  HU„)  =  R.  (51) 

(2)  The  recursive  relation  between  the  states  of  the 
difference-bit  image  and  borrow-bit  image  at  two  adja¬ 
cent  time  intervals  is 

«((,,,)  =  DU,)  A  «((,)  =  DU,)  u  HU,)  u  DU,)  U  HU,).  (52) 

Bit,,,)  =  I /)((,)  n  B(f,)|  ©  A'1  =  DU,)  u  HU,)  ©  A"1.  (53) 

where  i  =  0,1,2, . . .  ,k  +  1,  and  the  elementary  image 
A-1  is  used  to  shift  the  borrow-bit  image  one  bit  to  the 
left  for  the  next  iteration. 

(3)  After  a  maximum  of  k  +  1  iterations,  the  differ¬ 
ence-bit  image  is  the  result  and  the  borrow-bit  image 
becomes  the  null  image  0: 

/)«*,,)  =  X  -  R.  HUtt,)  =  0.  (54) 

This  procedure  is  illustrated  in  Fig.  20(d).  The  result 
of  parallel  subtraction  of  binary  numbers  with  a  maxi¬ 
mum  fc-bit  word  size  is  obtained  after  k  +  1  iterations. 
The  DOCIP  architecture  can  realize  this  by  the  follow¬ 
ing  program  (instructions): 

Assume  start  with  X  in  M\[=  D(£o)]  and  R  in  Af2[= 

B(t0)]. 

First  to  fcth  iterations: 

(1)  At,  u  At,  -  A/,(  =  DU,)  u  fl((,)|, 

(2)  Af,  u  Af,  --  Af,|  =  DU,)  u  B(f,)|, 

(3)  Af ,  u  Af,  -  *  Af , |  =  0(f,,|)|, 

(4)  Mi  ©  A-1  *  Af2|  =  fl(/l4I)|. 

where  i  =  0,1,2 . k  —  1. 

(k  +  1  )th  iteration: 

(1)  At,  u  Af,  *  Af;,|  =  DUD  u  HUk) I' 

(2)  At |  u  At,  -  At,  1  =  DU,)  o  B((t)|, 

(3)  At,  u  At,  *  At,|=  />«,,,)  =  X  -  R\. 


«»5  J}=ls 


01001 

00000 

ornTf 

•  ■  > 

OTTOU 

/XM  = 

• 

fl«i)  = 

• 

01001 

00000 

[Kjrrny 

oouco 

IHh)  = 

■ 

M) 

Kig.  20.  Parallel  subtraction  of  binary  row-coded  numbers: 

(a)  image  X  of  operands;  (b)  image  R  of  other  operands;  (c)  ouput 
Ar  —  R,  (d)  procedure  for  parallel  subtraction  A'  —  R. 


sponding  parallel  operations,  we  can  implement  this 
parallel  subtraction  by  the  equations 

( 1 )  The  least  significant  bit  planes  of  difference  bits 
and  borrow  bits: 

A oi  =  Xm  A  R,o,  =  AT.o,  u  fl„„  o  R mi  <j  X(0I,  (55) 

An  =  X(0t  n  B,o |  =  X,„  u  Aon  (56) 

(2)  The  recursive  relations: 

A ,)  -  |A(,,  n  Rul  n  B„,]  u  [X,,,  r>  An  n  B(ll] 

u  [X,„  n  An  n  B(,|l  u  [X,,,  n  An  n  B„,] 

=  [X(ll  u  R„ ,  n  B(,,]  u  [X{,|  u  An  u  B(<( 

u  (An  u  An  u  fl„,]  o  X,n  u  An  u  Bul],  (57) 

Alii  =  1  AT,,,  o  It,,,  n  B,,,]  u  [X,n  n  An  n  B,,,] 
u  |X„,  n  n  B„,|  o  |X„,  n  An  n  fl„,| 

“  IAn  u  An  Kt  Anl  u  [ A ,, ,  u  An  v  B(l)] 

U  |X„,  u  An  U  B,,,)  U  (X„,  o  An  U  B„,] .  (58) 

where  i  =  0,1,2 . k  —  1. 

(3)  The  final  solution: 

X  -/(  =  /)  =  IA*-i . Ami  (591 


The  total  number  of  clock  cycles  in  the  DOCIP  to 
complete  this  subtraction  process  is  t(k)  <  4fc  +  3  = 

cm. 

Appendix  B:  Subtraction  of  Binary  Stack-Coded 
Numbers 

Let  the  result  of  the  parallel  subtraction  be  X  -  R  = 
D  =  [D(*-ii,D(*-2 i,  •  •  •  ,/>«»!  |e.g.,  Fig.  12(d)).  To  real¬ 
ize  it  using  the  three  fundamental  operations,  we  con¬ 
sider  a  serial  full-subtractor.  Applying  the  corre- 


This  algorithm  can  be  implemented  in  the  DOCIP 
architecture  by  the  program  (instructions): 

Assume  start  with  in  M\  and  /?<ot  in  AD. 
Calculate  D<oi  and  R{ ig 

( 1 )  At,  u  Af,  —  Mi  &  out  j  =  B, , 

(2)  Af,  u  Af,  -  Af,  |=  X„„  o  Anil, 

(3)  Af,  u  Af  i  *  out  1= 

Calculate  l)(\,  and  /?,■>,: 
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(1)  A',„  -  Af,, 

(2)  Af,  v  At,  -  Afj, 

CD  Af,  w  At,  -  Af,. 
(4)  *  A/„ 

(ft)  Af.,  U  Af:(  -  A/  ,, 
(6)  Af,  u  A/,  *  Af  „ 


’0100 

JJLflJJL. 

Difference  Oils  OOm 
Sorrow  Oils  3  Q  Q  1  1 
Deference  Oils  0000 1 
Sorrow  Otis  00  l  1 
Difference  Oil*  0000' 
Sorrow  Oils  300 


Origin 

4 


Rule  1  g  — ►  g3 

Rule  2  i  — ►  i1 
Rule  3  g  — ►  o' 

Ruie  4  ;  ►  g 


Rule2  ■  —  □□ 


(7)  Af,  !..■  Af ,  -A/,. 

(8)  ft,,,  *  Af,. 

(9)  A/,  Af,  *  Af,. 

(10)  Af,  u  Af,,  *  Af,„ 

till  A’, ,,  *  Af,,, 

(12)  Af,  u  Af,  *  Af,. 

(13)  fl,,,  -  Af,. 

(14)  Af,  v.,  Af„  *  Af„. 

(1ft)  Af,  y  Af.,  *  out  [=  />,,,]. 

(16)  A',,,  *  Af  ,. 

(17)  Af,  Af,  -  Af,, 

(18)  ft,,,  *Af„ 

(19)  Af,  u  Af;,  -  Af,, 

(20)  Af,  u  Af2  -*  Af,,  &  out  (=  8(2,1- 

Calculate  D(2)  to  D(*_n  and  BI:I)  to  B(*i: 

Use  the  same  instructions  for  calculating  Dm  and 
B(2)  except  that  X<u  and  B<  i>  [and  Dn)  and  B(2)]  are 
replaced  by  X<()  and  /?,,>  [and  D(,,  and  B<l+i>]  in  each 
iteration,  and  in  the  beginning^  an  iteration  the  mem¬ 
ory  Af3  stores  B(,|  instead  of  B<it.  i  =  2,3,. . .  ,fe. 

Therefore,  the  total  execution  time  in  the  DOCIP  to 
complete  this  parallel  subtraction  is  t(k)  <  20(k  —  1)  + 
3  =  20fe  -  17  =  Oik). 

Appendix  C.  Subtraction  of  Binary  Symbol-Coded 
Numbers 

Similar  to  addition,  we  gradually  use  4  symbolic 
substitution  rules  [Fig.  21(a)],  but  Rules  1  and  4  are  not 
necessary  for  single-pixel  coding.  The  symbolic  sub¬ 
stitution  system  using  single-pixel  coding  for  binary 
subtraction  can  be  realized  as 

>'(<„)  =  A’,  (60) 

Vtt^,)  =  tlV(r_,)0t)«,,,l Af)  ©  Q"' 

liv'd,)  ®  rtV’i  u  |Vd,)  ®  n  Afi  e  tf” 

.^.2  liv'd,)  ®  u  |V((,)  ®  /)',"]  n  Af|  ®  (/■'.  (61) 


Fig.  21.  Parallel  subtraction  of  binary  symbol-coded  numbers: 
(a)  four  symbolic  substitution  rules  for  subtraction;  (b)  reference 
image  pairs  ft1'1  and  reference  images  Q{,l,i  =  1,2, 3,4,  used  for  sub¬ 
traction;  because  Q(n  and  Q'*1  are  null  images,  Rules  1  and  4  are  not 
needed  for  single-pixel  coding;  (c)  mask  Af;  (d)  example  of  parallel 
subtraction  of  binary  symbol-coded  numbers. 


(2)  =  8-',  ftf  =  /,  Q'2)  =  lo  A-'B-' 

(3)  ft™  =  /,  ft.™  =  R-\  Q'"  =  /, 

(4)  R\"  =  u,l.0  8-',  ftf  =  *.  =  <0. 

where  the  null  image  0  and  the  elementary  images  are 
as  defined  in  Subsec.  2.1;  and  the  mask  JVf  [Fig.  21(c)]  is 
a  shifting  of  the  mask  for  binary  addition.  Because 
Qm  and  Q,4)  are  null  images,  and  the  dilation  of  a  null 
image  is  a  null  image,  Rules  1  and  4  are  not  needed  for 
simple  intensity  coding.  Figure  21(d)  gives  an  exam¬ 
ple.  The  execution  time  for  the  DOCIP  is  t{k)  < 
ll(fc  +  1)  =  0{k). 

Similar  to  binary  addition,  we  can  develop  symbolic 
substitution  binary  subtraction  algorithms  with  BIA 
representations  for  coding  a  symbol  with  two  or  six  > 
pixels.  However,  four  symbolic  substitution  rules  are 
still  required  becr.use  Qu)  and  Q'4'  will  not  be  equal  to 
the  null  image.  The  DOCIPs  take  approximately  the 
same  execution  time  for  binary  subtraction  using  two- 
pixel  or  six-pixel  coding  as  for  binary  addition. 


where  FU*.  1)  is  the  result  of  the  subtraction,  j  = 
0,1,2,. . .  ,k,k  is  word  size  (i.e.,  the  number  of  bits  in  an 
operand);  Ru>  =  (B'/'.BV’I  and  Qu>  are  shown  in  Fig. 
21(b)  and  represented  as 

(i)  ft1,  (/"’  =  *. 


References 

1.  A.  A.  Sawchuk  and  T.  (\  Strand.  “Digital  Optical  Computing." 
Proc.  IKKK72,  758(1984). 

2.  Special  Issue  on  Optical  Computing.  Croc.  IKKK  72,  No  7 
(1984). 

-I.  A.  Huang,  Y.Tsunoda.J.  W.  Goodman,  andS.  Ishihara. “Optical 


15  March  1989  /  Vol  28.  No  6  /  APPLIED  OPTICS 


Compulation  Using  Residue  Arithmetic.”  App).  Opt.  IS.  I -4i* 
(1979). 

4.  I).  1 ‘salt is  and  I).  Casasent.  “Optical  Residue  Arithmetic:  A 
Correlation  Approach,”  Appl.  Opt.  IS,  163  ( 1979). 

T>.  F.  A.  Horrigan  and  W.  W.  Stoner,  “Residue- Based  Optical 
Processor,”  Proc.  Six*.  Photo-Opt.  Inst  rum.  Kng.  IKS,  19  (1979). 

6.  A.  M.  Tai,  I.  Cindrich,  *1.  R.  Fienup,  and  C.  C.  Aleksoff.  “Optical 
Residue  Arithmetic  (Computer  with  Programmable  (imputa¬ 
tion  Modules,”  Appl.  Opt.  18,  2812  (1979). 

7.  G.  Abraham,  "Multiple  Values  Logic  for  Optoelectronics.”  Opt. 
Kng.  25,  8  (1986). 

8.  T  T.  Tao  and  I).  M.  Campbell,  “Multiple- Valued  Logic:  An 
Implementation,”  Opt.  Kng.  25,  14  (1986). 

9.  H.  Arrathmtn  and  S.  Kozaitis,  “Shadow  Casting  for  Multiple- 
Valued  Associative  Logic,”  Opt.  Eng.  25,  29  (1986). 

10.  S.  L.  Hurst,  “Multiple-Valued  Threshold  Logic:  Its  Status  and 
Its  Realization,”  Opt.  Eng.  25.  44  (1986). 

1 1.  B.  K.  Jenkins,  A.  A.  Sawchuk,  T.  C.  Strand,  R.  Forchheimer,  and 

B.  H.  Softer,  “Sequential  Optical  Logic  Implementation,"  Appl. 
Opt.  23,3455(1984). 

12.  B.  K.  Jenkins,  P.  Chavcl,  R.  Forchheimer,  A.  A.  Sawchuk.and  'I'. 

C.  Strand.  “Architectural  Implications  of  a  Digital  Optical 
Processor,”  Appl.  Opt.  23,  3465  ( J984). 

13.  K.  S.  Huang,  B.  K.  Jenkins,  and  A.  A.  Sawchuk,  “Binary  Image 
Algebra  and  Digital  Optical  Cellular  Image  Processors,"  in 
Technical  Digest  of  Topical  Meeting  on  Optical  Computing 
(Optical  Society  of  America.  Washington.  DC.  1987).  pp.  20  23. 

14.  K.  S.  Huang,  B.  K.  Jenkins,  and  A.  A.  Sawchuk,  “A  Cellular 
Hvpercube  Architecture  for  Image  Processing,"  Proc.  Soc.  Pho- 
to-Opt.  Instrum.  Eng.  829,331  (1987). 

15.  K.  S.  Huang,  B.  K.  Jenkins,  and  A.  A.  Sawchuk,  “Optical  Cellu¬ 
lar  Logic  Architectures  Based  on  Binary  Image  Algebra,”  in 
Proceedings ,  IEEE  Computer  Society  Workshop  on  Computer 
Architecture  for  Pattern  Analysis  and  Machine  Intelligence , 
Seattle  (Oct.  1987),  pp.  19-26. 

16  K.  S.  Huang,  B.  K.  Jenkins,  and  A.  A.  Sawchuk,  “Binary  Image 
Algebra  and  Optical  Cellular  Logic  Processor  Design.”  Comput. 
Vision  Graphics  Image  Process.  Feb.  1989. 

17.  A.  Huang,  “Parallel  Algorithms  for  Optical  Digital  Computers,” 
in  Technical  Digest ,  IEEE  Tenth  Internationrt  Optical  Com¬ 
puting  Conference  (1983),  pp.  13-17. 

18.  K.  Brenner  and  A.  Huang,  “An  Optical  Processor  Based  on 
Symbolic  Substitution,”  in  Technical  Digest  of  Topical  Meet¬ 
ing  on  Optical  Computing  (Optical  Society  of  America.  Wash¬ 
ington,  DC,  1985),  paper  WA4. 

19.  K.-H.  Brenner,  A.  Huang,  and  N.  Streibl,  “Digital Optical  Com¬ 
puting  with  Symbolic  Substitution,”  Appl.  Opt.  25, 3054  ( 1986). 


20.  K.  S.  Huang.  B.  K.  Jenkins,  and  A.  A.  Sawchuk.  “Binary  Image 
Algebra  Representations  of  Optical  Cellular  Logic  and  Symbolic 
Substitution,”  in  Technical  Digest  o(  10X7  Annual  Meeting 
(Optical  Society  of  America,  Washington,  DC,  19S7l. 

21.  1).  Psaltis  and  R.  A.  Athale,  “High  Accuracy  Computation  with 
Linear  Analog  Optical  Systems:  A  Critical  Study,”  Appl.  Opt 
25,3071  (1986). 

22.  J.  Serra,  Image  Analysis  and  Mathematical  Morphulngv  (Aca 
demic.  New  York,  1982). 

23.  K.S.  Huang,  B.  K.  Jenkins,  and  A.  A.  Sawchuk,  “Programming  a 
Digital  Optical  Cellular  Image  Processor.”  in  Technical  Digest 
of  10X7  Annual  Meeting  (Optical  Society  of  America,  Washing 
ton.  DC.  1987). 

24.  B.  K.  Jenkins  and  A.  A.  Sawchuk,  “Optical  Cellular  Logic  Archi 
lectures  for  Image  Processing,”  in  Proceedings ,  IEEE  Comput¬ 
er  Society  Workshop  on  Computer  Architecture  for  Pattern 
Analysis  and  Image  Database  Management,  Florida  (Nov. 
1985),  pp.  61-65. 

25.  K.-H.  Brenner,  “New  Implementation  of  Symbolic  Substitution 
Logic,"  Appl.  Opt.  25,  3061  (1986). 

26.  K.-H.  Brenner  and  G.  Stucke,  “Programmable  Optical  Proces¬ 
sor  Based  on  Symbolic  Substitution,  in  Technical  Digest  <>f 
Topical  Meeting  on  Optical  Computing  (Optical  Society  of 
America,  Washington,  DC,  1987),  pp.  6  8. 

27.  J.  N.  Mait  and  K.-H.  Brenner,  “Optical  Systems  for  Symbolic 
Substitution."  in  Technical  Digest  of  Topical  Meeting  on  Opti¬ 
cal  Computing  (Optical  Society  of  America.  Washington,  DC. 
1987),  pp.  12-15. 

28.  T.  J.  Cloonan,  “Strengths  and  Weaknesses  of  Optical  Architec¬ 
tures  Based  on  Symbolic  Substitution,”  in  Technical  Digest  of 
Topical  Meeting  on  Optical  Computing  (Optical  Society  of 
America,  Washington,  DC,  1987),  pp.  12-15. 

29.  C.  I).  Capps,  R.  A.  Falk,  and  T.  L.  Houk,  "Optical  Arithmetic, 
Logic  Unit  Based  on  Residue  Number  Theory  and  Symbolic 
Substitution,”  in  Technical  Digest  of  Topical  Meeting  on  Opti¬ 
cal  Computing  (Optical  Society  of  America,  Washington,  DC. 
1987),  pp.  62-65. 

30.  P.  A.  Ramamoorthy  and  S.  Antony,  “Optical  MSD  Adder  Using 
Polarization  Coded  Symbolic  Substitution,”  in  Technical  Di¬ 
gest  of  Topical  Meeting  on  Optical  Computing  (Optical  Society 
of  America,  Washington,  DC,  1987),  pp.  111-114. 

31.  Ho-In  Jeon,  “Digital  Optical  Processor  Based  on  Symbolic  Sub¬ 
stitution  Using  Matched  Filtering,”  in  Technical  Digest  of 
Topical  Meeting  on  Optical  Computing  (Optical  Society  of 
America,  Washington,  DC,  1987),  pp.  115-118. 

32.  M.  T.  Taso,  et  al .,  “Symbolic  Substitution  Lfsing  ZnS  Interfer¬ 
ence  Kilters,”  Opt.  Eng.  26,  41  (1987). 


APPLIED  OPTICS  /  Vol  28.  No.  6  /  15  March  1989 


A  Cellular  Hypercube  Architecture  for  linage  Processing  t 


K.  S.  Huang,  B.  K.  Jenkins,  A.  A.  Sawchuk 
Signal  and  Image  Processing  Institute 
Department  of  Electrical  Engineering 
University  of  Southern  California 
Los  Angeles,  CA  90089-0272 

Abstract 

In  this  paper  we  present  a  two-dimensional  cellular  hypercube  architecture  for  image  processing  that  combines 
features  of  the  conventional  hypercube  and  cellular  logic  architectures  for  2-D  computation  cells.  A  unified  theory 
of  parallel  binary  image  processing,  binary  image  algebra  (BIA),  serves  as  a  software  tool  for  designing  parallel 
image  processing  algorithms.  To  match  the  hardware  to  the  software,  we  characterize  the  cellular  processors 
using  the  same  algebraic  structure  as  BIA.  The  two-dimensional  cellular  hypercube  image  processor  is  a  cellular 
SIMD  machine  with  N2  cells  and  has  a  simple  overall  organization,  low  cell  complexity  and  fast  processing  ability. 
An  optical  cellular  hypercube  implementation  of  BIA  is  proposed  which  offers  parallel  input/output  and  global 
interconnection  capabilities  which  are  difficult  to  do  in  planar  VLSI  technology. 

1.  Introduction 


Image  processing  and  image  analysis  tasks  have  large  data  processing  requirements  and  inherent  parallelism. 
Parallel  cellular  logic  architectures  are  generally  considered  appropriate  models  for  parallel  image  processing.  The 
cellular  logic  computer  was  first  inspired  by  the  writings  of  von  Neumann  [1][2]  on  cellular  automata.  The  first 
highly  parallel  cellular  image  processor  was  suggested  by  Unger  [3][4].  Unger  proposed  and,  later,  simulated  a 
two-dimensional  array  of  modules  (  or  processing  elements  or  cells)  as  a  natural  spatial  computer  architecture  for 
image  processing  and  recognition.  In  this  approach,  each  computational  cell  is  responsible  for  one  pixel  (or  one 
element  of  an  image)  with  its  neighboring  pixels.  A  cellular  logic  (or  neighboring  logic)  operation  is  then  referred 
to  as  a  transfo.m  of  an  array  of  data  X(i,j)  into  a  new  array  of  data  X'(i,j)  where  each  element  in  the  new 
array  has  a  value  determined  only  by  the  corresponding  element  in  the  original  array  along  with  the  values  of  its 
neighbors.  Fig.  1  shows  a  typical  conventional  nearest-neighbor  connected  cellular  logic  architecture.  Some  review 
of  cellular  image  processors  can  be  found  in  Ref.  (5]-[8). 


—  Connections  In  the  4 -connected  cellular  array 
=)  Connections  In  the  8-connected  cellular  array 


Figure  1:  A  nearest- neighbor  connected  cellular  logic 
architecture.  Each  retanguiar  box  represent!  a  com¬ 
putation  cell. 


Figure  2:  A  conventional  hypercube  (4-cube)  laid  out 
in  two  dimensional  apace.  Its  interconnection!  have 
no  apalial  invariance. 


One  important  problem  in  cellular  image  processing  is  that  the  nearest-neighbor  connected  cellular  image  pro¬ 
cessors  have  poor  communication  capabilities  and  no  unified  systematic  theory  for  both  parallel  image  processing 
algorithms  and  architectures.  Section  2  suggests  a  cellular  hypercube  architecture  to  improve  the  communication 
capabilities  of  cellular  logic  computers.  Section  3  summarizes  a  binary  image  algebra  (BIA)  to  serve  as  a  unified 
binary  image  processing  theory  for  both  algorithms  and  architectures.  Section  4  discusses  a  digital  optical  cellular 
hvpercubc  image  processor,  DOCIP-hypercube,  for  efficiently  implementing  BIA. 
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2.  Cellular  Hypercube  Architecture 


Conventional  nearest-neighbor  connected  cellular  arrays  have  poor  communication  capabilities;  their  perfor¬ 
mance  is  primarily  limited  by  their  0(1)  interconnectivity.  To  improve  this  while  preserving  a  reasonable  number 
of  interconnections,  ideally  a  conventional  hypercube  increases  the  interconnectivity  to  0(log2M)  for  M  process¬ 
ing  elements  (PEs).  (We  refer  to  a  PE  in  a  cellular  computer  as  a  cell  which  usually  has  no  address  and  index 
registers.)  A  conventional  SIMD  hypercube  computer  is  comprised  of  M  =  2'  PEs,  where  /  is  a  non-negative 
integer.  All  the  PEs  are  synchronized  and  operated  under  the  control  of  a  single  instruction  stream.  They  are 
indexed  0  through  M  -  1  and  the  p,h  PE  is  referred  :  P£(p)  for  p  6  [0,  M  -  1],  A  hypercube  is  denoted  as  a 
/-cube  where  /  =  log2M  represents  the  number  of  directly  connected  PEs.  Let  pf_,p)_2...p0  be  the  binary  repre¬ 
sentation  of  p,  and  let  p(6)  be  the  number  whose  binary  representation  is  p/_i  •  P4+iPtP4_i...p0,  where  pi  is  the 
complement  of  pt  and  0  <  6  <  /.  In  the  hypercube  model,  PE{p)  is  connected  to  those  PE(p (4>)  for  0  <  b  <  1  (i.e. 
a  direct  connection  exists  only  between  processors  whose  binary  indices  differ  by  one  bit  position),  and  data  can 
be  transmitted  from  one  PE  to  another  in  one  step  only  via  this  interconnection  pattern  [9].  The  worst  case  for 
an  inter-PE  communication  requires  log2M  routes. 


Connections  in  me  4-directed  cellular  hypercube 
Connections  in  me  6-directed  cellular  hypercube 


Figure  3.  A  two- dimensional  cellular  hypercube  —  DOCIP-hypercube .  Each  cell  U  interconnected  with  other  cell*  having 
a  relative  one  bit  difference  in  coordinate  label  in  poaitive  or  negative  x  and  y  direction*  to  achieve  a  spatially  »ymmetric 
and  invariant  interconnection  pattern.  Only  connection*  from  the  central  cell  are  ahown;  all  cell*  are  connected  identically 
ao  the  reaulting  interconnection*  are  apace  invariant. 

However,  when  a  conventional  hypercube  is  laid  out  in  two-dimensional  space  (e.g.  Fig.  2  gives  a  4-cube),  its 
interconnection  patterns  are  not  6pace  invariant;  such  spatial  invariance  is  desirable  for  image  processing  and  for 
simple  hardware  implementation.  To  include  this,  we  increase  the  interconnections  to  make  a  two  dimensional 
cellular  hypercube  (Fig.  3).  The  cellular  hypercube  introduces  a  symmetrical  positive  and  negative  index  so  that 
each  cell  is  connected  with  cells  having  a  relative  one  bit  difference  in  coordinate  label  in  positive  or  negative  x  and 
y  directions;  the  numerical  difference  of  addresses  of  connected  cells  is  nonzero  in  at  most  1  bit.  A  two-dimensional 
SIMD  cellular  hypercube  computer  consists  of  M  =  N7  —  (2n  —  l)2  cells  and  p  =  2*,  k  is  a  non-negative  integer. 
They  are  indexed  (— r  4-  1,—  n  +  1)  through  (n  —  l,n  —  1)  and  the  (q,r),h  cell  is  refered  as  CELL(q,r)  for 
q,r  £  (— n  +  1,  n  —  1],  In  the  4-directed  cellular  hypercube  (cellular  hypercube4)  model,  C ELL{q,r)  is  connected 
to  those  CELL(q  dt  2J,r)  and  CELL(q,r  ±  2d)  for  0  <  d  <  k;  and  in  the  8-directed  cellular  hypercube  (cellular 
hypercube8)  model,  CELL(q,r)  is  connected  to  those  CELL(q±2d,r),  CELL(q,r±?.d)  and  CELL(q±2d  ,r±2d) 
for  0  <  d  <  k.  Data  can  be  transmitted  from  one  cell  to  another  in  one  step  only  via  this  interconnection  pattern, 
although  it  occurs  in  parallel  for  each  pixel.  For  N7  =  (2n  —  l)2  cells,  the  worst  case  for  an  inter-cell  communication 
requires  2 log2n  or  4 log2n  (they  are  0(log2N))  routes  for  the  8-directed  or  4-directed  cellular  hypercube  repectively. 

This  cellular  hypercube  architecture  requires  a  3-D  global  interconnection  mechanism  which  is  difficult  to 
implement  on  a  planar  VLSI  chip  [7](10](ll],  However,  in  principle,  the  3-D  interconnection  mechanism  is  realizable 
by  digital  optical  systems,  because  the  general  architectural  structure  of  a  digital  optical  computer  is  inherently 
3-dimensional  [12][13],  Thus,  a  digital  optical  cellular  image  processor  based  on  the  cellular  hypercube  architecture 


(DOCIP-hypercube)  is  a  possible  implementation. 

To  develop  a  two-dimensional  image  processor,  we  face  the  problem  that  image  processing  has  no  standard 
unified  theory,  and  so  many  image  processing  algorithms  and  architectures  exist  in  a  state  of  chaos.  Thus,  we 
first  discuss  a  simple  unified  consistent  theory  of  image  processing  (covering  both  algorithms  and  architectures)  in 
section  3,  and  then  consider  its  optical  implementation  on  a  digital  optical  cellular  hypercube  processor,  DOCIP- 
hypercube,  in  section  4. 

3.  Binary  Image  Algebra 

An  algebraic  structure  provides  a  theoretical  framework  of  image  processing  because  algebra  is  a  foundation  of 
mathematics,  computer  language  and  automata  theories.  During  the  past  few  years,  numerous  papers  have  used 
an  algebraic  approach  to  aid  in  image  processing  [14]-[19],  Here,  a  binary  image  algebra  (BIA)  is  summarized  to 
serve  as  the  software  theory  of  cellular  image  processors. 


3.1  Basic  Definitions 

In  general,  a  binary  digital  image  is  defined  as  a  function  /  mapping  each  grid  point  ( x,y )  of  the  picture  on  an 
orthogonal  coordinate  system  onto  the  set  composed  of  1  (white,  i.e.  image  point)  and  0  (black,  i.e.  background 
point).  However,  to  have  a  better  compact  parallel  representing  form  of  a  binary  image,  we  can  use  the  coordinates 
of  image  points  (T’s)  to  specify  an  image.  In  this  paper,  an  image  is  treated  as  the  set  of  coordinates  of  image 
points  (set  of  pixels  that  have  value  1).  We  begin  the  description  of  BIA  by  defining  our  artificial  universe: 

Definition  S.l  The  Universal  Image.  The  universal  image  is  a  set  W  =  {(z,y)  |  x  £  Z„,y  £  Zn },  where 
Zn  —  {0,±1,±2,  ...,±n}  and  n  is  a  positive  integer.  Thus,  all  images  are  defined  in  a  (2n  +  1)  x  (2n  +  1)  array  of 
points. 

Definition  S.S  Image  Space :  The  image  space  is  the  power  set  (the  set  of  all  subsets)  of  the  universal  image, 
i.e.  5  =  P(W). 

Definition  S.S  Image:  A  set  X  is  an  image  if  and  only  if  X  is  an  element  of  the  image  space  S,  i.e.  A'  is  a 
subimage  of  the  universal  image  W . 

Definition  S.j  Image  Point:  A  point  (x,y)  is  an  image  point  of  an  image  X  if  and  only  if  (x,  y)  is  an  element 
of  the  set  X. 

Definition  3.5  Image  Transformation:  A  transformation  T  is  an  image  transformation  if  and  only  if  T  is  a 
function  mapping  from  the  image  space  S  to  the  image  space  S. 

Definition  S.6  Three  Fundamental  Operations 

There  are  three  fundamental  operations: 

1.  Complement  of  an  image  X:  X  =  {(x,  y)  |  (x,  y)  £  W  A  (x,  y)  £  A'} 

2.  Union  of  two  images  A'  and  R:  X  U  R  =  {(x,p)  |  (x,  y)  £  X  V  (x,y)  £  R) 

3.  Dilation  of  two  images  A'  and  R: 


XQ>R={  *2,yt  +  V2)  e  w  |  (xi.yi)  €  x,(x2,y2)  e  iz)  (A  iM)  A  (fl  9*  <£) 

(  4>  otherwise 

Remark:  “6”  means  “belongs  to”,  “A”  means  “and”,  “V”  means  “or”,  and  “<£”  is  the  null  image  having  no 
image  point.  Note  that  X  usually  represents  an  input  or  data  image  and  R  is  a  reference  image.  We  can  define 
other  image  operations  as  fundamental  operations  instead  of  these  three  operations.  The  reason  for  choosing  these 
three  operations  is  because  of  their  simplicity,  simple  software  design  and  simple  hardware  implementation.  Figure 
4  gives  an  example  of  these  fundamental  operations. 
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Figure  4:  An  example  of  fundamental  operation*:  complement  ",  union  U,  and  dilation  <£. 
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Definition  S.7  Elementary  Images:  There  are  5  elementary  images: 
1.1  =  {(0,0)}  —  consisting  of  an  image  point  at  the  origin 


2.  A  —  {(1,0)}  —  consisting  of  an  image  point  right  of  the  origin 

3.  A~ 1  =  {(  —  1,0)}  —  consisting  of  an  image  point  left  of  the  origin 

4.  B  =  {(0, 1)}  —  consisting  of  an  image  point  above  the  origin 

5.  B-1  =  {(0,-1)}  consisting  of  an  image  point  below  the  origin 

In  fact,  these  5  elementary  images  could  be  reduced  to  4  elementary  images  because  /  =  a0  =  A  ©  A-1  =  = 

BsB'1. 


3.2  Two  Fundamental  Principles 

Two  fundamental  principles  basically  define  the  binary  image  algebra  (BIA).  Before  stating  these  two  principles 
we  give  some  preliminary  results.  The  proofs  are  omitted  here  for  brevity  [19], 

Lemma  1.  (A'  0  R)  U  (X  ©  R)  U  7  =  j  ^  otherZiZe  VA  ’  R  6  P(W)- 

Remark :  7  =  {(0,0)}  is  an  elementary  image,  R  =  {(-*,  -y)  |  (*,„)  €  B}  is  a  reflected  reference  image,  and 
V  means  Tor  all  . 

Theorem  1.  Any  image  transformation  T  :  P(W)  —  P(W)  can  be  expressed  as 

T(x )  =  U=1  {(X  ©  Ri)  U  (X  ©  %)  u  7  ©  Q,  } 

where  k  <  /,  l  is  the  cardinal.ty  (i.e.  the  number  of  elements)  of  P(W),  and  Ri  and  Q,  are  the  reference  images 
used  to  form  any  desired  image  transformation. 

Remark:  (J,*.,  Ri  =  R\  U  R2  U  ...U  Rk. 

Theorem  S.Kny  image  can  be  represented  as 

A'  =  U(0)€A-  A'B> 

where  A' B1  =  A'  ©  B> ,  A*  =  ASA©...®  A  =  {(«,0)}  (if  » 

A  ’©A  '©...©A  1  =  {(— t,  0)})  and  A,  B,  A"1,  B_1  are  the 

^  -V  <•' 

k 

Principle  ].  Fundamental  Principle  of  Image  Transformations 
Any  image  transformation  T  can  be  implemented  by  using  appropriate  reference  images  R  and  the  three 
fundamental  operations:  1.  Complement  A'  of  an  image  A',  2.  Union  U  of  two  images,  3.  Dilation  © 
of  two  images. 

Proof.  It  follows  from  Theorem  1 . 


=  — »  is  a  negative  integer,  then  A~k  = 
elementary  images  defined  in  Definition  3.7. 


Principle  1  solves  almost  any  problem  in  binary  image  processing/analysis,  especially  in  shape  inspection,  size 
verification,  and  pattern  recognition  with  shift,  scaling  and  rotational  invariance  [19]  [20].  However,  in  reality,  how 
can  we  build  a  computer  that  offers  arbitrary  and  programmable  reference  images  for  dilations?  Do  we  really  need 
a  large  memory  to  store  many  kinds  of  reference  images?  The  answer  is  “no”.  The  second  fundamental  principle 
suggests  an  economical  way  to  accomplish  this. 

Principle  2.  Fundamental  Principle  of  Reference  Images 
Any  reference  image  R  can  be  generated  from  elementary  images  (/,  ArA^' ,  B,  B~' )  by  using  the  three 
fundamental  operations. 

Proof  It  follows  from  Theorem  2. 

Therefore,  by  the  above  principles,  the  algebraic  structure  of  BIA  can  be  defined  as: 

Definition  of  Binary  Image  Algebra  (BIA) 

Binary  image  algebra  is  an  algebra  with  an  image  space  5  and  a  family  F  of  finitary  operations 
including  5  elementary  images,  which  are  0-ary  operations,  and  3  fundamental  operations,  which  are 
non  0-ary  operations.  Symbolically, 

BIA  =(P(VP);©,u,-,/,A,A-1,B,B-1) 
i.e.  5  =  P(W)  and  F  =  (©,  U,  ’, /,  A,  A"1 ,  B,  B~  >). 


Remark:  For  any  integer  k,  a  F-ary  operation  on  S  is  defined  to  be  a  function  f  :  Sk  —  S.  Thus,  a  unary  (or 
1-ary)  operation  on  S  is  simply  a  function  on  5  to  5.  A  binary  (or  2-ary)  operation  on  S  is  a  function  on  S2  to 
5.  For  completeness,  we  define  a  nullary  (or  O-ary)  operation  on  S  to  be  a  particular  element  of  S. 

4.  Implementation:  DQCIP-hypercube 

To  map  algorithms  into  architectures  in  a  transparent  way,  we  use  an  algebraic  approach  for  describing  a 
cellular  image  processor  first.  Then  we  design  the  digital  optical  cellular  image  processors  (DOCIPs)  and  their 
optical  implementation.  The  DOCIP-hypercube,  a  two-dimensional  cellular  hypercube,  uses  optical  parallel  global 
interconnection  capabilities  and  offers  further  improvements  in  speed  and  flexibility. 

4.1  Algebraic  Description 

Having  defined  cellular  automata  and  the  implementation  requirements  of  BIA,  we  describe  the  DOCIP  in  an 
algebraic  way: 

Definition  of  Cellular  Automata 

A  cellular  automaton  is  an  algebra  A  -  (S;  F,  Nc)  where  S  is  the  state  space  which  is  a  set  of  states, 

F  is  a  family  of  transition  functions,  and  Ne  is  the  neighborhood  configuration. 

Constraints  of  Implementing  BIA: 

1.  S  3  P(W) 

2.  F  D  {©,U,-} 

3.  Nc  D  1UAUA-\UBUB~1  (or  Nc  D  A  U  A~x  U  B  U  B"1) 
where  “D”  means  “contains”. 

Thus,  in  terms  of  cellular  automata,  the  DOCIPs  have  to  satisfy  the  above  constraints  for  realizing  BIA.  For 
storing  input  images  and  temporary  results  in  a  more  flexible  way,  the  DOCIPs  utilize  three  memory  modules  and 
share  the  6ame  algebraic  structure  (except  the  neighborhood  configuration): 

DOCIP  =  (P(W3);©,U,  ~ ,Nc) 
where  Nc  can  be  one  of  the  following  4  types: 

1.  DOCIP-array4:  each  cell  connects  with  its  four  nearest  neighbors  and  itself,  i.e.  Narravs  =  I  U  A  U  A-1  U 
BUB-1. 

2.  DOCIP-array8.  each  cell  connects  with  its  eight  nearest  neighbors  and  itself,  i.e.  Narray6  -  U.j=o,±i  A'B>  ■ 

3.  DOCIP-hypercube4:  each  cell  connects  with  those  cells  in  the  4  directions  at  distances  1,2, 4, 8,  ...,2*  from 

it,  i.e.  Nkypercubt*  =  (U.  =  o,±l,±2 . ±2‘  J4,)U(U••=±1,±21...,±2*■B,)• 

4.  DOCIP-hypercube8:  each  cell  connects  with  those  cells  in  the  8  directions  at  distances  1, 2,4,8, ...,  2*  from 

it,  i.e.  NhyptrcubeB  =  Uij=0,±l,±2 . ±2‘  A  _ 

Among  of  these,  DOCIP-hypercube4  is  most  preferred  because  its  hardware  requirement  is  simpler  than 
DOCIP- hypercube8  while  they  have  the  same  order  of  performance.  The  DOCIP-array  architectures  are  nearest- 
neighbor  connected  but  have  poor  global  communication  capabilities. 


Figure  5;  A  digital  optical  cellular  image  proc e##or  (DOCIP)  architecture  —  one  implementation  of  binary  image  algebra 
(BIA).  The  DOCIP-aiTay  require*  9  (or  5)  control  biU  for  reference  image  Et .  The  DOCIP-hypercube  require#  O(logN) 
control  bit*  for  reference  image  Et. 


4.2  General  Description 


From  the  above  algebraic  description,  the  DOCIPs  have  the  same  algebraic  structure  and  differ  only  in  their 
neighborhood  configurations  Nc.  Thus,  they  share  the  same  architecture  as  shown  in  Fig.  5,  but  have  different 
configurations  of  the  reference  images  Ei  depending  on  the  optical  interconnection  network  which  defines  the 
neighborhood.  In  practical  applications,  a  larger  reference  image  R  can  be  generated  from  a  set  of  smaller  reference 
image(s)  Ei  by  a  “sequential  dilation”.  If  it  is  possible  to  decompose  R  into  a  sequence  R  =  £t) 

then 


A  ©  R  —  (...((A'  ©  £<i)  ©  Ej)  ©  ...  ©  Eh). 

Figure  6  gives  an  example:  a  dilation  with  a  polygon  (filled)  reference  image  R  is  implemented  by  a  sequential 
dilation  with  four  line  reference  images  ;  it  requires  O(L)  time  for  the  DOCIP-array  and  0(log2L)  time  for  the 
DOCIP-hypercube  where  L  is  the  maximun  length  of  the  four  line  reference  images.  This  decomposition  may  not 
always  exist,  in  which  case  R  can  always  be  decomposed  as  R  =  R\  U  R2  U  ...  U  Rt,  and  then 

A'  ©  R  =  [X  ©  Ri)  U  (A  ©  R2)  U  ...  U  (X  ©  Rk) 

where  each  Rj  can  be  composed  from  the  smaller  reference  images  Ei . 


Figure  6:  An  example  of  decomposing  a  dilation  with  a  larj tr  reference  image  ft  into  a  sequential  dilation  with  some 
»ms//er  reference  image  £,.  It  requires  O(N)  and  0(logN)  time  for  DOCIP-array  and  DOCIP-hypercube  respectively. 


The  proposed  DOCIP  as  shown  in  Fig.  5  is  a  cellular  SIMD  machine  and  consists  of  an  array  of  cells  or 
processing  elements  (PEs)  under  the  supervision  of  a  control  unit.  The  control  unit  includes  a  clock,  a  program 
counter,  a  test  and  branch  module  for  feedback  control,  and  an  instruction  decoder  for  storing  instructions  and 
decoding  them  to  supervise  cells.  The  array  of  cells  includes  a  1  x  3  x  N2  bit  destination  selector,  three  N  x  N  x  1 
bit  memories  for  storing  images,  a  memory  selector,  and  a  dilation  unit. 

The  DOCIP  shown  in  Fig.  5  operates  as  follows:  (1)  a  binary  image  {N  x  N  matrix)  is  selected  by  the 
destination  selector  and  then  stored  in  any  memory  as  the  instruction  specifies;  (2)  after  storing  the  images  (1  to 
3  N  x  N  matrices),  these  images  and  their  complemented  versions  are  piped  into  the  next  stage,  which  forms  the 
union  of  any  combination  of  images;  (3)  the  result  is  sent  to  a  dilation  where  the  reference  image  specified  by  the 
instruction  is  used  to  control  the  type  of  dilation;  (4)  finally,  the  dilated  imagd* can  be  output,  tested  for  program 
control,  or  fed  back  to  step  (1)  by  the  address  field  of  the  instruction. 

4.3  Optical  Implementation 

The  entire  system  can  be  realized  by  an  optical  gate  array  with  optical  3-D  interconnections  (10)[20).  Figure 
7  shows  an  optical  concept  for  the  DOCIP-hypercube  implementation.  It  embeds  an  array  of  cells  in  an  array  of 
optical  binary  gates  and  performs  interconnections  of  these  gates  by  an  optical  hologram.  It  should  be  noted  that 
current  optical  technology  has  implemented  only  arrays  of  moderately  large  numbers  of  gates  (500  x  500)  at  very 
®low  (~ms)  switching  speeds,  and  alternatively,  arrays  of  small  numbers  of  gates  (2  x  2  to  6  x  6)  at  fast  switching 
speeds  (0.1  ps  -  50ps)  [21][22j.  Current  ongoing  research  in  a  number  of  laboratories  looks  promising  in  eventually 
providing  the  needed  arrays  of  large  numbers  of  gates  with  reasonably  fast  switching  speeds.  Alternatively,  control 


of  the  DOC1P  can  be  easily  realized  by  using  an  electronic  host  instead  of  the  optical  control  unit,  since  control 
of  S1MD  systems  is  primarily  a  serial  process.  The  tradeoff  is  a  possible  inefficiency  in  the  interfaces  between 
electronic  and  optical  units.  Because  of  this,  the  all-optical  approach  may  be  preferable  in  the  long  term.  To 
efficiently  utilize  optical  gates,  they  can  be  interconnected  with  a  2-D  optical  multiplexing  technique  in  which  a 
common  controllable  mask  is  used  for  all  cells.  The  optical  multiplexing  technique  has  following  advantages:  1) 
the  DOCIP  will  no  longer  require  the  broadcasting  of  instructions  from  the  control  unit  —  instead  all  cells  fan 
their  outputs  into  a  common  controlling  mask  pixel;  2)  it  will  reduce  the  number  of  gates;  and  3)  each  cell  has 
a  simple  structure  —  essentially  containing  only  a  3-bit  memory  with  inverting  and  non-inverting  outputs,  and  a 
multiple-input  OR  gate  for  dilation. 


N  >  N  Output  Sid*  of  Amy  of  Calls 
(impMmantad  by  optica'  gala  array) 


N  *  N  Input  Side  ol  Amy  o<  Calls 
{implemented  by  optical  gala  array) 


Fif  ire  7:  An  optical  cellular  hypercube  (DOCIP-hypercubet  or  DOCIP-hypercube8) .  Imaging  optics  are  omitted  for 
clarity  Each  cell  connects  with  cells  in  the  4  directions  or  8  directions  at  distances  1, 2, 4, 8, ..., 2“  from  it  by  optical  3-D 
free  interconnection.  The  input  and  output  aides  of  the  optical  gate  array  are  interconnected  by  an  optical  feedback  path 
and  are  shown  separately  for  clarity. 


To  avoid  the  well-known  drawbacks  of  conventional  computers  based  on  von  Neumann  principles  [12]  [13],  the 
machine  in  Fig.  5  has  one  instruction  which  implements  the  three  fundamental  operations  of  BIA  along  with 
fetch  and  store.  This  design  uses  the  parallelism  of  optics  to  simultaneously  execute  instructions  involving  all  N7 
picture  elements 

This  single  instruction  has  the  following  format: 

(si,«2,  >*6,711  > ”2,  •••,”*, di,d2 1 <f3,,;'i,j2,ai, <12,  ...,01,61,62,  ...,6|) 

where  k  is  determined  by  the  chosen  neighborhood  configuration  Nc,  the  DOCIP-array  requires  k  =  5  or  k  =  9 
bits  for  controlling  the  reference  image  R  at  a  clock  cycle,  the  DOCIP-hypercube  requires  k  =  0[log2N)  for  N 2 
cells,  and  /  defines  the  maximum  length  of  a  program:  2!.  The  functions  of  these  11  +  Jk-f  2/  instruction  codes  are: 

•  *i ,  «2, ...,  «6  select  the  output  from  the  memory  elements, 

•  n  1,02,  ...,ru  control  the  neighborhood  mask,  i.e.  to  specify  the  reference  image; 

•  d^,d2,  and  1/3  select  the  destination  memory  for  storing  the  image; 

•  ]  1  and  j2  flag  an  absolute  jump  or  conditional  jump; 

•  dj ,  ci2, ...,  ai  are  the  address  for  a  jump;  and  ,  y 

•  6j, 62,.. .,6;  are  the  address  of  the  instruction. 

In  contrast  with  the  DOCIP-array,  the  DOCIP-hypercube  increases  the  interconnection  complexity  to 0(log2N), 
but  is  able  to  perform  many  global  operations  in  0(log2N)  time.  Compared  with  conventional-array  processors 
having  serial  or  N-paralle!  input/output,  the  DOCIP-array  will  have  the  same  order  of  performance  in  local  and 
global  operations  but  will  be  improved  in  input/output  performance,  and  in  principle  could  be  as  low  as  0(1)  in 
I/O  operations  The  DOCIP-hypercube  will  not  only  be  improved  in  input/output  performance  but  also  in  global 
operations.  With  external  memory,  it  can  be  demonstrated  to  be  general  purpose  in  the  sense  that  it  simulates 
any  Turing  machine.  One  important  feature  in  the  design  of  the  DOCIP-array  and  DOCIP-hypercube  is  that 
optical  3-D  free  interconnection  capabilities  can  be  used  to  reduce  the  cell  hardware  requirements  as  well  as  solve 
the  global  connection  and  I/O  problems  which  are  difficult  to  solve  by  planar  VLSI  technology. 


5.  Conclusions 


A  two-dimensional  cellular  hypercube  architecture  has  been  proposed  to  have  the  best  features  of  both  two- 
dimensional  hypercube  and  cellular  logic  architectures  for  image  processing.  BIA  suggests  an  unified  theory  of 
parallel  binary  image  processing  for  developing  parallel  algorithms/languages  and  can  be  generalized  to  grey-level 
images.  The  DOCIP-hypercube  utilizes  the  parallel  communication  and  global  interconnection  capabilities  of 
optics  for  avoiding  communication  bottlenecks  and  matching  BIA  parallel  algorithms  with  the  two-dimensional 
cellular  hypercube  architecture.  The  current  design  of  DOCIP-hypercube  has  an  extremely  simple  cell  organization 
with  only  a  3-bit  flip-flop  memory  and  a  multiple-input  OR  gate  for  emphasizing  the  binary  image  processing  and 
reducing  the  hardware  cost.  BIA  and  DOCIP-hypercube  can  have  many  applications  in  character  recognition, 
industrial  inspection,  medical  and  scientific  research,  especially  morphological  image  processing.  The  future  work 
is  its  optical  experimental  demonstration  and  its  analysis  of  different  cell  structures  with  larger  grain  sizes  for 
developing  fast  sophisticated  vision  algorithms. 
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2.4  Nonlinear  Optical  Processing  with  Halftones:  Degradation  and  Compen¬ 


sation  Models 


This  paper  is  concerned  with  the  halftone  process  used  in  coherent  optical  spatial  filtering  systems 
to  provide  general  nonmontonic  nonlinear  functions. 
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Nonlinear  optical  processing  with  halftones:  accurate 
predictions  for  degradation  and  compensation 


Ahmad  Armand,  Alexander  A.  Sawchuk,  and  Timothy  C.  Strand 


A  general  analysis  of  the  halftone  process  for  nonlinear  transformations  in  optical  signal  processing  is 
presented.  The  analysis  considers  the  effects  of  the  nonideal  characteristics  of  the  recording  medium.  The 
results  predict  output  errors  due  to  different  parts  of  the  recording  medium  characteristic  curve  for  any 
nonlinear  transformation.  A  synthesis  method  for  a  discrete  halftone  screen  density  profile  is  also  described. 
This  produces  an  optimum  halftone  screen  density  profile  for  any  form  of  recording  medium  characteristic 
curve  and  any  type  of  nonlinearity  in  the  sense  that  it  minimizes  the  mean-square  difference  between  desired 
and  degraded  outputs.  The  results  of  a  computer  simulation  for  logarithmic  and  level  slice  functions  are 
given. 


I.  Introduction 

In  halftone  nonlinear  optical  signal  processing  the 
continuous  tone  input  picture  is  transformed  into  a 
binary  picture  by  contact  printing  the  continuous  in¬ 
put  data  through  a  halftone  screen  onto  a  high-con¬ 
trast  recording  medium.  The  product  of  the  input  and 
halftone  screen  transc  ittances  is  clipped  in  the  pro¬ 
cess,  giving  an  array  of  binary  dots  whose  size  is  a 
function  of  clip  level,  the  input  transmittance,  and  the 
«■  halftone  transmittance  profile.  The  periodic  nature 
of  the  halftone  screen  causes  each  subregion  of  the 
binary  image  corresponding  to  a  constant  input  inten¬ 
sity  to  become  locally  periodic.  When  placed  in  the 
usual  coherent  optical  filtering  system,  multiple  dif¬ 
fraction  orders  appear  in  the  Fourier  transform  plane 
because  of  the  sampled  input.  The  procedure  for  pro¬ 
ducing  nonlinearities  involves  use  of  one  diffraction 
order  combined  with  specially  made  halftone  screens. 
A  filter  is  placed  in  the  Fourier  plane  that  transmits 
the  light  around  one  diffraction  order  and  blocks  ev¬ 
erything  else.  This  in  effect  demodulates  the  image.1 
After  the  filtered  diffraction  order  in  the  Fourier  plane 
is  inverse  transformed,  the  continuous  nonlinearity 
transformed  output  appears. 
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The  above  process  has  been  formulated  with  the 
assumption  that  a  binary  recording  medium  is  used  in 
the  halftoning  step.2  With  this  assumption,  once  the 
output  intensity  is  expressed  as  a  function  of  input . 
intensity  and  halftone  screen  density  profile  (analy¬ 
sis),  we  can  easily  invert  the  problem  and  get  the  half¬ 
tone  screen  density  profile  given  the  relationship  be¬ 
tween  the  output  and  input  intensities  (synthesis). 
Unfortunately,  almost  all  recording  media  deviate 
from  the  binary  assumption.  This  deviation,  which  is 
quite  small  for  high-contrast  photographic  films,  is 
quite  noticeable  for  any  real-time  spatial  light  modula¬ 
tors  presently  available.3-4  Consequently,  to  utilize 
accurately  the  halftone  technique,  we  remove  the  as¬ 
sumption  of  a  binary  recording  medium  from  the 
mathematical  formulation  of  these  processes.  Da- 
shiell  and  Sawchuk  modified  the  resuflts  on  halftone 
processing  for  an  ideal  recording  medium  by  including 
the  effects  of  finite  slope  and  saturation  density.5 
They  developed  a  numerical  procedure,  valid  for 
monotonic  halftone  cells,  to  compensate  in  advance  for 
some  recording  medium  effects  (precompensation).  A 
closed  form  solution  to  this  problem  bas  been  obtained 
by  Batten  and  Everett6  for  some  limited  nonlinear 
transformations.  The  formulation  of  Dashiell  and 
Sawchuk  does  not  predict  the  effects  of  general  nonlin¬ 
ear  characteristics  of  the  recording  medium  on  the 
overall  nonlinear  transformation.  Moreover,  their 
formulation  is  restricted  to  monotonic  halftone  cell 
shapes. 

In  this  paper  we  present  a  formulation  of  the  half¬ 
tone  process  which  considers  a  recording  medium  with 
a  characteristic  curve  of  general  shape  and  which  pre¬ 
dicts  the  final  degradation  of  the  output  for  any  half¬ 
tone  screen  cell  shape.  This  formulation  is  examined 
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for  the  case  of  a  binary  recording  medium,  and  its 
result  is  compared  to  previous  derivations.  To  obtain 
a  general  solution  to  the  precompensation  problem,  an 
approximate  method  which  considers  a  discrete  half¬ 
tone  screen  density  profile  is  described.  This  gives  the 
halftone  screen  density  profile  for  any  form  of  record¬ 
ing  medium  characteristic  curve  and  any  type  of  non¬ 
linearity  by  minimizing  in  the  mean-square  sense  the 
difference  between  desired  and  degraded  outputs. 
The  results  of  computer  simulation  for  logarithmic 
and  level  slice  functions  are  shown. 

It.  Degradation 

When  a  general  recording  medium  is  used  in  the 
halftone  process,  the  amplitude  transmittance  of  the 
half  toned  picture  consists  of  pulses  such  as  shown  in 
Pig.  1  that  are  not  binary.  The  amplitude,  width,  and 
shape  of  these  pulses  depend  on  the  input  picture 
density  levels,  halftone  screen  density  profile,  and 
shape  of  the  characteristic  curve  of  the  recording  medi¬ 
um.  Each  group  of  these  pulses  corresponds  to  a  con¬ 
stant  intensity  subregion  in  the  input  picture.  The 
period  L  (Fig.  1)  of  the  halftone  screen  is  chosen  to  be 
small  compared  with  the  period  of  the  highest  spatial 
frequency  component  in  the  input  picture.  Thus  any 
local  region  of  the  amplitude  transmittance  of  the  half- 
toned  transparency  t/,(x)  is  approximately  a  periodic 
sequence  of  pulses  which  can  be  expanded  in  a  complex 
Fourier  series, 

^  B*  elp(Jri£) '  (1> 

where 

B‘*Z  121 

In  the  sum  of  Eq.  (1)  each  term  (denoted  by  k) 
represents  a  grating  diffraction  order.  When  we  pro¬ 
duce  the  Fourier  transform  of  the  halftoned  picture  in 
the  coherent  optical  processor,  these  orders  appear  in 
the  Fourier  plane  as  isolated  spectral  islands.  The 
spatial  filter  in  this  plane  selects  a  single  order.  Hence 
the  resulting  intensity  distribution  Iq  at  the  processor 
output  is 

I0UmJ*)  -  !b,I2 

(3) 

which  relates  the  intensity  at  any  point  of  the  output 
picture  to  the  amplitude  transmittance  of  the  half- 
toned  picture  and  the  selected  order  k.  Now  th(x)  can 
be  related  to  the  input  intensity  /in  as  follows.  Let  the 
local  input  picture  intensity  that  produced  the  above 
train  of  pulses  on  the  halftoned  picture  be  denoted  by 
/in.  If  the  density  variation  of  one  period  of  the  half¬ 
tone  screen  is  represented  by  f(x),  the  intensity  trans¬ 
mitted  by  the  halftone  screen  is  /i„  X  10'/(l).'  If  we  let 
the  amplitude  transmittance  vs  log  exposure  (log/?) 
curve  of  the  recording  medium  be  described  by 
g(logF),  we  can  write 


Fig.  1.  Amplitude  transmittance  of  a  halftoned  transparency  made 
with  a  general  nonbinary  recording  medium. 


t 


Fig.  2.  Characteristic  curve  of  a  binary  recording  medium. 


t„(l)  -  fillogj/i.  X  10-«'>jf  =  g[log/b  -  /(*)).  (4) 

Replacing  this  in  Eq.  (3),  we  have 

“  |  J  jo  SiloKL.  ~  A*)]  dl|  ’  (5) 

which  relates  the  intensity  at  any  point  of  the  output 
picture  to  the  intensity  of  the  corresponding  point  in 
the  input  picture  through  a  nonlinear  integral  relation¬ 
ship.  When  the  specific  forms  of  g(logE)  and  f{x)  are 
substituted  in  this  relationship  and  the  integral  is 
solved,  the  overall  relation  between  the  output  intensi¬ 
ty  and  the  input  intensity  is  nonlinear.  This  nonlin¬ 
earity  depends  on  g(logE),  f(x),  and  the  value  of  the 
order  selected. 

A.  Binary  Recording  Medium 

The  characteristic  curve  of  a  binary  recording  medi¬ 
um  having  a  threshold  at  log Ir  is  shown  in  Fig.  2. 
Ideally,  a  =  0  and  6  =  1.  Note  that  this  form  of 
characteristic  curve  is  applicable  to  a  positive  trans¬ 
parency.  We  could  also  consider  the  more  familiar 
negative  transparency,  although  the  basic  results  re¬ 
main  the  same.  We  will  choose  the  positive  transpar¬ 
ency  curve  because  it  is  more  similar  to  the  characteris¬ 
tic  curves  for  real-time  devices.  We  now  simplify  the 
genera]  relationship  of  Eq.  (5)  using  the  characteristic 
curve  of  Fig.  2. 

B.  Zero  Order 

For  the  zero-order  case  k  =  0,  and  Eq.  (5)  becomes 
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(6) 


Given  g(log£)  as  the  binary  function  shown  in  Fig.  2 
and  assuming  /(x)  to  be  a  monotonically  increasing 
function,  we  have 


if  iogfin  -  fix)  <  log/,  or  i  >T 


'K) 


theng(log£)  “  a. 


if  log/m  -  fix)  >  log/,  or  X  <  r\  ^log  -y- j  •  theng(log£)  =  b. 

Substituting  these  relations  into  Eq.  (6),  we  have 

/<,</.„. 0)  ■*  |a  +  •  T1  ^ log  y) ]  ’  (7) 

where  /“*(•)  is  the  inverse  function  of  /(•).  This  result 
is  the  same  as  that  obtained  previously  for  the  case  a  = 
0, 6  =  1  (Ref.  2)  except  for  the  fact  that  it  is  obtained  in 
a  more  straightforward  manner  and  can  easily  be  gen¬ 
eralized. 

C.  Nonzero  Order 

For  k  ^  0,  the  above  simplifications  for  the  charac¬ 
teristic  curve  can  be  used  in  Eq.  (5)  to  obtain 


If  we  let 


,  ,,  l i  (b~a)2  . 


After  substituting  the  expression  for  xi  we  have 

/«(/«» -  »inJ  [r  r‘  t;)]  '  (11) 

This  also  agrees  with  the  previously  obtained  results.2 
This  new  analysis  provides  not  only  a  straightforward 
derivation  of  the  above  results,  it  also  leads  directly  to 
the  new  precompensation  techniques  described  in  the 
next  section. 

N.  Precompensation 

Several  methods  are  available  in  practice  for  gener¬ 
ating  halftone  screens  with  a  desired  density  profile. 
Some  methods  are  purely  optical  and  involve  the  pho¬ 
tographic  recording  of  geometrical  shadows  or  diffrac¬ 
tion  patterns  from  ruled  gratings.  Although  this  tech¬ 
nique  produces  continuous  halftone  screens,  it  does 
not  have  the  flexibility  to  produce  precisely  arbitrary 
screens  needed  for  nonlinear  processing. 


Fig.  3.  Step  approximation  to  halftone  screen  density  profile. 


Another  type  of  halftone  screen  density  profile  that 
can  be  generated  in  practice  is  a  step  function  approxi¬ 
mation  to  the  desired  continuous  density.  These  half¬ 
tone  screens  are  generated  by  digital  image  recorders, 
plotting  microdensitometers,  or  step-and-repeat  cam¬ 
eras.  Hence  the  theoretical  accuracy  available  in  de¬ 
signing  the  halftone  screen  density  profile  is  limited  by 
the  practical  limitations  in  making  the  screen.  This 
motivates  the  following  analysis  which  considers  the 
halftone  screen  density  profile  as  a  step  function  ap¬ 
proximation  of  the  ideal  profile.  As  will  be  shown  in 
this  section,  this  assumption  helps  to  simplify  the  for¬ 
mulas  and  allows  us  to  obtain  some  conclusions  which 
cannot  be  drawn  from  the  continuous  density  analysis. 
To  do  so,  we  utilize  a  discrete  density  halftone  screen 
and  derive  optimization  formulas  for  the  zero  and  non¬ 
zero  orders. 

A.  Zero  Order 

Equation  (6)  is  the  general  formula  relating  the  out¬ 
put  intensity  in  the  zero  order  to  the  input  intensity. 
Let/(x)  be  approximated  as  shown  in  Fig.  3,  then  Eq. 
(6)  can  be  written  as 

*  |£|jo  1  -  °i)<h 

+  £(log/,„  -  a2)dx 

K 

+  ■  ■  +  jL  g(log/m-  aN)dxj  •  (12) 

Assuming  that  the  x-axis  intervals  are  all  equal,  i.e., 

L,  -  0  «  L2  -  L,  “  ...  »  Ln  -  ■=  —  •  (13) 

Eq.  (12)  reduces  to 

.  * 

-£*(lo */»-«,) 

i-1 

The  above  formula  gives  the  output  intensity  in  the 
zero  order  as  a  function  of  the  discrete  grey  levels  on 
the  halftone  screen  and  the  characteristic  curve  of  the 
recording  medium. 

To  design  the  proper  halftone  screen,  we  want 
aj,lt.  in  Eq.  (14)  where  g{-)  and  the  desired 
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(14) 


functional  relationship  of  /o(/in,0)  in  terms  of  Im  are 
given.  Although  Eq.  ( 14)  is  an  approximate  represen¬ 
tation  of  /j„,  we  can  require  it  to  be  exact  for  a  discrete 
set  of  values  of  /in  =  /m,  where  1, . . .  ,m, . . .  M-  Thus  in 
discrete  form  Eq.  (14)  can  be  written  as 


i 

/„</,. 0)  •  Jj  ^#dog/,  -  a.) 
1-1 

,  N 

i-l 

I  .  *  I 


One  procedure  to  find  the  optimum  a,  terms  is  to 
minimize  the  mean  square  error  expression 


min£ 

al-°2*  *N 


i  'o(c°>- 

m-l  _  i-l 


This  should  produce  values  of  a,  that  bring  the  output 
intensity  in  the  zero  order  as  close  as  possible  to  the 
desired  output  intensity  in  the  mean  square  sense. 

Now,  because  the  a,  terms  are  the  different  density 
values  on  the  halftone  screen,  we  constrain  their  values 
to  lie  within  certain  limits,  as  expressed  by 

c<a,<d,  (17) 

where  c  and  d  are  given  non-negative  constants.  This 
makes  the  optimization  problem  one  of  constrained 
minimization.  We  want  a  transformation  from 
I—®,®]  to  [c,d].  The  functions  sin2  and  cos2  are  exam¬ 
ples  of  functions  that  transform  [—«,«]  to  [0,1].  We 
arbitrarily  choose  the  sin2  function.  If  we  let7 

a,  «  (d  -  c)  jin’j,  +  c,  (18) 

this  limits  the  values  of  the  a,  to  the  range  [c,d].  With 
this  change  of  variable,  Eq.  (16)  now  becomes 

M  . 

minE  =  Y  /0(/„,0) 

.  ..Vu  ^  I 


f  N  12  1 

'  •!  ^  -  (d  -  c)  sin2*,  -  c|  | 


which  gives  the  values  of  y,  terms.  The  corresponding 
a,  terms  can  then  be  found  from  Eq.  (18). 

In  going  from  Eq.  (16)  to  Eq.  (19)  we  should  make 
sure  that  the  minimum  of  E  with  respect  to  the  a,  terms 
is  in  fact  the  same  as  the  minimum  of  E  with  respect  to 
the  y,  terms.  To  check  this,  note  that  when  we  mini¬ 
mize  E  with  respect  to  y„  we  set 
AF 

-r~-  m  0,  (20) 

ty, 

which  is  equivalent  to 

SS-OL  1 . (21) 

da,  Ay, 

from  which  we  have 


•  “  o,  1 .... ...... dV 


—  =  0,  1 . i,...JV. 


Hence  to  have  Eq.  (22)  true  when  Eq.  (20)  is  true,  we 
should  avoid  satisfying  Eq.  (23).  To  check  for  this 
condition  note  that 

da 

— -  =  (d  -  c)  sin2y„  (24) 

dy, 

which  when  set  to  zero  gives 

y,  =  0  — •  a,  =  c  1 . (25) 


>2=2  ~a‘  =  d  1 . <26) 

It  can  be  seen  that  the  only  values  of  a,  in  the  range 
[c,d]  that  make  dajdy,  =  0  are  the  boundaries  of  this 
interval.  This  can  be  prevented  from  causing  a  prob¬ 
lem  by  choosing  c  and  d  so  that  the  interval  [c,d] 
contains  the  limit  values  for  the  a,  terms. 

As  an  example  of  a  function  possible  with  the  zero 
order  consider  the  design  of  a  logarithmic  screen.  In  a 
logarithmic  process  we  want  the  relationship 

/0(/m,0)  =  p  •  log ajlr)  (27) 

between  the  output  and  input  intensities  where  p  is  a 
constant  and  Ir  is  Bhown  in  Fig.  2.  Hence,  to  find  the  - 
optimum  screen  giving  the  above  logarithmic  relation¬ 
ship  with  a  recording  medium  characteristic  curve 
g(logE),  we  must  perform  the  minimization 


minJG 


[p.|og(/m/7,) 


“  j^f  ^«!logA,  -  (d  -  c)  si"2y,  -  c]j  ^  .  (28) 

For  the  initial  values  of  the  above  minimization  we  will 
use  the  halftone  screen  density  profile  values  obtained 
for  a  binary  recording  medium.  This  helps  the  mini¬ 
mization  converge  more  rapidly,  particularly  when  the 
recording  medium  response  is  close  to  binary.  In  prac¬ 
tice,  we  do  not  expect  to  perform  halftone  nonlinear 
processing  with  a  very  low  gamma  recording  medium. 

To  obtain  the  halftone  screen  density  profile  for  the 
logarithmic  process  with  a  binary  recording  medium  as 
shown  in  Fig.  2,  we  equate  Eqs.  (7)  and  (27)  to  get 

P  l0g(^)  =  [°  +  (±T^  r'  (log  ir)J  <29) 

and  substitute  u  =  log(/j„//r)  in  the  above  equation  to 
obtain 

p-u  -jo  +  —  ~-r'(u)j'  •  (30) 

Because  log  is  a  monotonic  function,  the  corresponding 
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Fig.  4.  Characteristic  curve  of  a  piecewise  linear  recording  medium. 


halftone  screen  density  profile  will  also  be  a  monotonic 
function.  Hence  f  is  a  monotonic  function  so  we  can 
write  /l/-'(u)]  =  u.  Using  this  in  Eq.  (30)  we  have 

p  •  nr\M\ -Jo  +  r'luij  •  wu 

Now  let  x  =  f~l(u)  in  the  above  equation  to  obtain 

(<x)«  +  .  0  <x<L.  (32) 

Hence  the  initial  values  for  the  a,  terms  are 

a,  =  p[a  +  (^T£)1T’  0-ISt•  (33) 

From  Eq.  (18)  the  corresponding  y,  terms  are 

y‘ =  8in"‘[fc<H>[a  +  PttH  "  C)1  ‘  (34) 

The  above  y,  terms  will  now  initialize  the  minimization 
procedure  of  Eq.  (28)  to  obtain  the  corresponding  yt 
terms  for  a  nonbinary  recording  medium. 

A  computer  algorithm  was  written  which  performs 
the  above  minimization.  It  uses  the  ZXMIN  subroutine 
which  is  taken  from  the  IMSL  library.8  This  subrou¬ 
tine  is  based  on  a  quasi-Newton  algorithm  for  finding 
the  minimum  of  a  function  of  N  variables.9  In  the 
quasi-Newton  method  we  do  not  directly  solve  for  the 
minimum  of  E  by  solving  Eq.  (20)  directly.  Rather  we 
use  an  iterative  procedure  which  starts  from  the  initial 
point  and  uses  Eq.  (20)  to  get  as  close  as  possible  to  the 
minimum.  Thus  we  do  not  expect  to  produce  the 
undesired  solution  of  Eq.  (23)  as  the  minimum. 

The  results  for  recording  media  of  the  type  shown  in 
Fig.  4  with  a  =  0,  b  =  1,  and  different  slopes  in  the 
linear  part  are  shown  in  Figs.  5-8.  The  slopes  are 
called  gamma  in  the  figure  but  should  not  be  confused 
with  the  usual  photographic  gamma.  It  is  assumed 
that  there  are  thirty  discrete  points  in  the  halftone 
screen  density  profile,  and  the  density  values  are  be¬ 
tween  0  and  2  in  these  figures.  It  is  also  assumed  that 
the  values  of  log/f  (shown  in  Fig.  2)  lie  between  the 
middle  of  the  values  of  log/;  and  log/2  (shown  in  Fig.  4). 
Note  that  the  plots  of  the  input-output  curves  are 
semilogarithmic,  and  hence  the  result  is  a  straight  line. 
In  Figs.  5  and  7  the  graphs  labeled  ideal  represent  the 
desired  relationship  between  input  and  output  intensi¬ 
ties.  This  can  be  obtained  with  the  halftone  screen 


density  values  of  Eq.  (32)  and  binary  recording  medi¬ 
um  characteristic  curve  of  Fig.  2  in  Eq.  (5).  The  de¬ 
graded  graph  is  obtained  by  using  the  density  values  of 
Eq.  (32)  and  the  recording  medium  of  Fig.  4  in  Eq.  (5). 
The  optimized  graph  is  generated  by  using  the  density 
values  obtained  from  Eq.  (28)  and  the  characteristic 
cure  of  the  recording  medium  for  which  those  density 
values  were  produced  in  Eq.  (5). 

In  Figs.  6  and  8,  the  ideal  graph  represents  the  densi¬ 
ty  profile  of  a  halftone  screen  for  a  logarithmic  process 
using  a  binary  recording  medium.  The  optimized 
graph  is  the  Jensity  values  obtained  from  Eq.  (28)  for  a 
recording  medium  with  the  characteristics  shown  in 
Fig.  4.  It  is  seen  in  Fig.  5  that  for  gamma  of  3.0,  there  is 
a  significant  amount  of  degradation,  and  the  optimized 
screen  has  been  successful  in  removing  this  degrada¬ 
tion.  For  a  gamma  of  10,  shown  in  Fig.  10,  as  one  might 
expect,  there  is  less  degradation,  and  the  optimized 
screen  is  even  more  successful  in  producing  the  ideal 
result. 

B.  Nonzero  Order 

The  general  formula  relating  the  output  intensity  in 
the  nonzero  orders  to  the  input  intensity  is 

/.(U)  =  j%Pog/,„  -  /<*»  (35) 

from  Eq.  (5).  Using  a  quantized  approximation  to 
/(x),  as  in  the  zero-order  case,  we  can  write  (from  Fig.  3) 

[Jo '  gdog/u,  -  e*P  dx 

+  j^Vlog/ia  -  a2)  exp  dx  + 

+Lg(iog/“_QN)exp(z??)dx]|2' 

(36) 

For  simplicity  we  let 

g(\ogIw  -  a,)  =  g,,  1 . (37) 

Then 

,  I  [  f  j2rk(L,  -  0)1  1 

^  h  H — l — J  - ') + •  •  ■ + ** 

ffzwkLN.t\\  j 

X  exp(  L  )  |eXP[ - L - H|  • 

(38) 

Now  from  Eq.  (13) 

L,  -0  =  L2-  =  LN- LN.t  =  L/N,  (39) 

so  when  Eq.  (39)  is  used  in  Eq.  (38)  we  have 


Note  also  that 
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Fig.  5.  Logarithmic  transfer  function  for  a  piecewise  linear  record¬ 
ing  medium  with  gamma  =  3.0:  (a)  ideal;  (6)  degraded;  (c)  opti¬ 
mized. 


Fig.  7.  Logarithmic  transfer  function  for  a  piecewise  linear  record¬ 
ing  medium  with  gamma  =  10.0:  (a)  ideal;  (6)  degraded;  (c)  opti 
mized. 


«x) 


Fig.  6.  Halftone  cell  shape  correspondin';  to  Fig.  5:  (a)  ideal;  (6) 
optimized. 


f(x) 


Fig.  8.  Halftone  cell  shape  corresponding  to  Fig.  7:  (a)  ideal;  (6) 
optimized. 


L, 


(41) 


minE 

°l,af  *N 


Consequently  Eq.  (40)  can  be  written  as 


l 


^/fdog/^  -  a,)  eIPp-^^ — ~] 


(42) 


Equation  (42)  gives  the  output  intensity  in  any  non¬ 
zero  order  k  in  terms  of  the  input  intensity,  the  charac¬ 
teristic  function  of  the  recording  medium,  and  the 
discrete  grey  levels  on  the  halftone  screen.  To  design 
tht  proper  halftone  screen,  as  in  the  case  of  the  zero 
order,  we  should  determine  suitable  a,  values  from  Eq. 
(42).  The  procedure  that  we  take  is  the  same  as  the 
one  for  the  zero  order.  Namely,  we  minimize  the  ex¬ 
pression 


HTgOog/,*  -  a,)  exp 


(43) 


To  limit  the  resulting  density  values  as  in  Eq.  (17)  we 
transform  the  problem  to  the  minimization 

•  -  W  ~  C>  sin2y,  -  c| 


X  exp 


r>2a*(i  -  1)1 

r  n 


(44) 


The  y,  terms  are  related  to  the  a,  terms  through  Eq. 
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(18).  To  initialize  the  values  of  the  y,  terms  for  this 
minimization  procedure,  we  use  the  density  values 
obtained  for  the  screen  when  the  recording  medium  is 
binary. 

As  an  example  of  a  function  possible  in  the  nonzero 
order  we  consider  the  design  of  a  level  slice  screen.  To 
obtain  a  level  slice  transformation  the  first  order  is  a 
suitable  choice.  For  this  function  we  want 


/0.  for/io</„, 


W..11 


-1* 

X 

,0.  for  7fa  >  lb. 


(45) 


This  function  is  shown  in  Fig.  9.  The  density  profile 
given  a  relationship  for  the  binary  recording  medium 
shown  in  Fig.  2  for  a  =  0  and  b  =  1  is 


/<*)  = 


iog<y/„. 


for  0  <  *  <  j  . 

for  —  <  *  <  L, 
2 


(46) 


which  is  shown  in  Fig.  10.  The  corresponding  density 
levels  in  the  discrete  screen  are 


log(/„//,),  for i  <  Y  • 

log  UJb).  for  ‘  -  v  • 


Using  Eq.  (47)  in  Eq.  (18),  we  have 


y, ' 


sin1 

log(fa//r)  -  c 

1/2 

d  —  c 

sin1 

:iog (/*//,) -c= 

1/2 

d  -  c 

for  '  <  T 
for'-f' 


(47) 


(48) 


With  the  initial  values  for  y,  terms  as  above  we  then 
want  to  minimize 

rf. "I  l'"1'"”  "iM'rH’ 

IN 

«(log/m  -  W  -  c)  ain2y,  -  c] 

*•  1 

w  fy2ir(i  —  1)“ 

Xe,p[— 

A  computer  routine  similar  to  the  one  for  the  loga¬ 
rithmic  process  has  been  written  to  perform  the  above 
minimization.  With  the  same  assumptions  about  the 
recording  medium  and  the  halftone  screen  as  with  the 
logarithmic  process,  the  result  is  shown  in  Figs.  11-14. 
Figure  11  shows  the  ideal  level  slice  function  and  the 
results  that  would  be  obtained  for  a  recording  material 
with  an  effective  gamma  of  3.0.  Also  shown  is  the 
optimized  level  slice  obtained  by  precompensating  the 
halftone  screen  profile.  It  is  seen  that  although  the 
mean  square  error  between  the  ideal  and  the  actual 
response  curves  can  be  significantly  reduced,  the  finite 
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Fig.  9.  Level  slice  function. 
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Fig.  10.  Halftone  screen  density  for  the  level  slice  function  of  Fig.  9. 


Fig.  11.  Level  slice  transfer  function  for  a  piecewise  linear  record¬ 
ing  medium  with  gamma  =  3.0:  (a)  ideal;  (6)  degraded;  (c)  opti¬ 
mized. 


slope  of  the  leading  and  trailing  edges  of  the  level  slice 
is  still  limited  by  the  gamma  of  the  recording  medium. 
Figure  12  shows  the  halftone  profile  for  an  ideal  re¬ 
cording  material  and  the  precompensated  halftone 
profile  for  a  recording  material  with  a  gamma  of  3. 
Figures  13  and  14  are  the  corresponding  results  assum¬ 
ing  a  recording  material  with  a  gamma  of  10,  which  is 
much  closer  to  the  ideal. 
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Fig.  12.  Halftone  cell  shape  corresponding  to  Fig.  12:  (a)  ideal;  (6) 
optimized. 
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'Fig.  13.  Level  slice  transfer  function  for  a  piecewise  linear  record¬ 
ing  medium  with  gamma  «  10.0:  (a)  ideal;  (b)  degraded;  (c)  opti¬ 
mized. 
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Fig.  14.  Halftone  cell  shape  corresponding  to  Fig.  13:  (o)  ideal;  (b) 
optimized. 


IV.  Conclusions 

A  new  formulation  of  the  halftone  nonlinear  pro¬ 
cessing  technique  has  been  presented.  The  formula¬ 
tion  is  general  and  works  for  any  recording  medium 
characteristic  curve  shape  and  any  halftone  screen  cell 
shape.  Thus  one  can  easily  predict  the  amount  of 
degradation  of  the  output  due  to  a  nonbinary  charac¬ 
teristic  curve  of  the  recording  medium.  This  is  partic¬ 
ularly  useful  for  real-time  realization  of  the  halftone 
processing.  In  this  case,  a  real-time  image  transducer 
is  used  as  the  recording  medium.  Because  the  pres¬ 
ently  developed  devices  do  not  possess  the  desired 
sharp  threshold  characteristic,  halftone  screen  pre¬ 
compensation  is  necessary. 

The  problem  of  the  design  of  the  halftone  screen 
density  profile  with  a  nonideal  recording  medium  has 
been  solved  by  an  approximate  method  which  obtains 
the  halftone  screen  density  profile  by  minimizing  the 
difference  in  a  mean  square  sense  between  desired  and 
degraded  outputs.  Results  of  computer  simulations 
for  logarithmic  and  level-slice  transformations  have 
been  given.  The  results  show  that  for  smooth  nonlin¬ 
earities  like  the  logarithmic  function,  it  is  possible  to 
compensate  for  the  nonideal  characteristic  of  the  re¬ 
cording  medium.  For  nonlinearities  with  sharp  jumps 
like  the  level-slice  function,  the  compensation  is  less 
successful. 
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2.5  Optical  Symbolic  Substitution  and  Pattern  Recognition 


The  attached  paper  “Optical  Symbolic  Substitution  and  Pattern  Recognition  Algorithms  Based 
on  Binary  Image  Algebra”  by  K.S.  Huang,  B.K.  Jenkins  and  A. A.  Sawchuk  from  ICO  Topical 
Meeting  on  Optical  Computing,  Toulon,  France,  August  29  -  September  2,  1988  describes  the 
application  of  binary  image  algebra  to  pattern  recognition  systems  for  cellular  processors. 
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Summary 

Binary  image  algebra  (BLA),  a  unified  systematic  complete  theory  of  parallel  binary  image  processing  [l],  also  provides  a 
unified  spatial  logic  of  digital  optical  computing  for  describing  symbolic  substitution,  cellular  logic  and  Boolean  logic  in  parallel  [2], 
Symbolic  substitution  has  been  used  to  implement  logic,  arithmetic,  communication  and  simulating  a  Turing  machine  [3];  but  its 
implementation  of  some  operations  (e.g.  parallel  binary  arithmetic)  is  relatively  complicated  to  other  BIA  implementations  [2].  In 
this  paper  we  further  suggest  some  BIA  algebraic  techniques  and  pattern  recognition  algorithms,  including  a  shift,  scale  and  rotation 
invariant  algorithm,  to  improve  the  speed,  flexibility  and  complexity  of  symbolic  substitution. 

A  symbolic  substitution  rule  involves  two  steps:  1)  recognizing  the  locations  of  a  certain  spatial  search-pattern  within  the  2-D 
input  data,  and  2)  substituting  a  new  replacement-pattern  wherever  the  search-pattern  is  recognized.  As  illustrated  in  Fig.  1,  BIA 
can  be  used  to  realize  a  symbolic  substitution  rule  defined  by: 

(X  © R)  ©  Q  =  ((X  0  *,)  n  (T©  Ri))  ©  Q  =  (X®  rt, )  U  ( X  <&R2)®Q  ,ij 

where  .Y  is  the  2-D  input  data.  R  =  ( Rt ,  Rt)  is  the  reference  image  pair  corresponding  to  the  search-pattern  ( Rt  and  R2  define  the 
foreground  and  the  background  of  the  search-pattern  respectively),  R  defines  a  reflected  reference  image  given  bv  R  =  {( —  r,  -y)  \ 
(z,y)  €  R],  Q  is  the  reference  image  corresponding  to  the  replacement-pattern,  “©”  denotes  the  hit  or  miss  transform  which  is 
the  pattern  recognizer,  “©”  denotes  the  erosion  operation,  and  “©"  denotes  the  dilation  operation  which  is  the  pattern  replacement 
operator.  To  work  with  more  than  one  rule  (say  p  substitution  rules)  for  practical  applications,  a  symbolic  substitution  system  (Fig. 
2)  produces  several  copies  of  the  input  X ,  provides  p  different  recognizer-substituter  units,  and  then  combines  the  outputs  of  various 
units  to  form  a  new  output.  Thus,  a  symbolic  substitution  system  is  implemented  by 

(J(X©«<i>)©Q(’)  (2) 

•  ■1 

where  and  Q(,\  i  =  1,2,  ...,p,  are  the  reference  image  pairs  and  replacement  patterns  in  the  i,h  symbolic  substitution  rule. 
This,  then,  is  the  BIA  formula  for  general  symbolic  substitution. 

However,  in  many  cases  the  above  form  is  inefficient  and  can  be  reduced  to  a  relatively  simpler  form  or  implemented  in  a  more 
efficient  way  by  using  some  BIA  algebraic  techniques.  Here  are  some  examples:  1)  the  full  recognition  can  be  implemented  by  only 
the  background  or  foreground  recognition  under  certain  conditions;  2)  if  Q =  <t>,  the  t"'  symbolic  substitution  rule  in  Eq.  (2)  is 
not  needed  (e.g.  the  four  rules  of  binary  subtraction  in  simple  intensity  coding  of  arithmetic  data  can  be  reduced  to  only  two  rules 
[2]);  and  3)  if  Q =  Q  for  all  1  <  i  <  p  (this  happens  in  those  cases  that  a  class  of  search-patterns  is  defined  by  a  set  of  reference 
image  pairs  R^'\  i  =  1, 2, ...,  p),  we  should  combine  the  results  of  the  hit  or  miss  transforms  first  and  then  replace  them  by  the  same 
replacement-pattern  Q  instead  of  implementing  p  substitution  units  for  realizing  the  same  substitution  step,  i.e. 

p 

dj*®zfio)©Q.  (3) 

•  at 

The  practical  difficulty  with  the  implementation  in  Eqs.  (2)  and  (3)  is  that  the  hit  or  miss  transform  is  only  efficient  for  the 
shift  invariant  recognition  and  would  require  a  large  number  of  intricate  reference  image  pairs  to  perform  the  recognition  step  m  the 
presence  of  changes  in  scale,  rotation  or  both.  Thus,  it  might  be  too  costly  to  implement  scale  and  rotation  invariant  recognition  of 
intricate  patterns  for  symbolic  substitution  based  on  the  above  formula.  For  example,  if  we  want  to  substitute  all  “square  patterns' 
in  an  input  image  by  the  same  character  “S",  it  would  be  very  inefficient  to  use  the  above  symbolic  substitution  implementation 
techniques. 

To  solve  this  kind  of  scale  and  rotation  invariant  problem,  here  we  recognize  all  the  desired  patterns  by  reversing  the  growing 
procedure  of  a  family  of  patterns.  This  family  defines  all  patterns  in  the  presence  of  changes  in  scale,  rotation  or  both,  and  transforms 
all  the  desired  patterns  into  their  original  seeds,  which  are  isolated  single  image  points.  We  have  developed  a  description  of  this 
procedure  in  terms  of  BIA  For  brevity,  here  we  describe  only  the  case  of  shift  and  scale  invariant  recognition.  Suppose  we  want  to 
recognize  all  square  patterns  with  different  Kales  and  locations  in  the  input  image  X  (e.g.  Fig.  3(a))  and  to  produce  the  output 
image  V  (e.g.  Fig.  3(b)).  The  procedure  is:  1)  determine  a  growing  sequence  of  the  desired  patterns  T.  (e.g.  Fig.  3(c)),  where 
0  <  i  <  m  and  the  largest  size  of  the  desired  patterns  is  m  x  m;  2)  find  a  small  set  of  good  reference  image  pairs  (Z?(0))  (e.g.  Fig.  3(d) 
has  only  3  small  reference  image  pairs  for  recognizing  all  square  objects  with  different  Kales)  satisfying  some  criteria,  where  each 
reference  image  pair  in  (ff(tf)}  corresponds  to  a  possible  neighborhood  of  a  given  foreground  image  point  in  a  pattern  T„  1  <  i  <  m. 

whose  previous  state  in  the  pattern  T,_i  is  a  background  point;  3)  transform  the  desired  patterns  7",,  i  =  1,2 . m,  in  the  2-D  input 

image  .Y  =  .Y(fo)  into  their  original  seeds  (i.e.  To  which  contains  one  and  only  one  foreground  image  point)  by  the  recursive  relation 
.Y(U+,)  =  .Y(t*)/(J,€.,  *(<«.)  ®Z?(«),  where  0  <  k  <  m;  and  4)  pickup  the  original  seeds  by  Y  =  -Y(U)®Q,  where  Q  (Fig.  3(e)) 


is  a  reference  image  pair  with  one  and  only  one  foreground  image  point  at  the  center  and  Y  is  the  final  recognition  output.  Bv 
selecting  good  reference  image  pairs  associated  the  growing  sequences  of  rotation  patterns,  we  can  extend  shift  and  scale  invariance 
to  include  rotation  invariance  in  a  similar  way.  This  algorithm  can  efficiently  reduce  the  computation  complexity  for  a  certain  class 
of  pattern  recognition  and  symbolic  substitution  problems;  their  computation  times  depend  only  on  the  diameter  of  the  largest 
desired  pattern,  but  not  on  the  number  of  patterns  nor  the  size  of  the  whole  image. 

A  digital  optical  cellular  image  processor  (DOCIP)  [l]  [2]  implements  all  the  above  algorithms  of  symbolic  substitution  and 
pattern  recognition  in  a  flexible  and  efficient  way  compared  to  a  symbolic  substitution  processor  (Fig.  2)  with  p  fixed  recognizer- 
substituter  units.  The  DOCIP  programming  for  these  algorithms  will  be  illustrated. 
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Figure  1.  BIA  representation  of  symbolic  substitution.  The 
optional  mask  .Vf  is  for  controlling  the  block  search  region. 


Figure  2.  A  symbolic  substitution  system  with  p  symbolic 
substitution  rules. 
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(b)  The  output  image  Y. 

»  ns  !!!l  li  I 

To  T,  r,  Ts  T« 

(c)  The  growing  sequence  of  square  patterns  T, ,  o  <  i  <  4. 

W)} 

(d)  A  set  of  good  reference  image  pairs  {R(0)}  for  square 
patterns  with  different  scales. 

1:  foreground  points  with  value  1 
b:  background  points  with  value  0 

(e)  The  reference  image  pair  Q. 

Figure  3.  A  shift  and  scale  invariant  pattern  recognition  of 
square  patterns. 


2.2  Digital  Optical  Cellular  Architectures 


The  papers  reprinted  in  this  section  discuss  details  of  optical  cellular  architectures  and  their  in¬ 
struction  set. 


The  DOCIP  is  a  2-D,  page  oriented  array  of  individual  processors  located  at  every  pixel  of  a 
large  image.  The  attached  paper  by  K.S.  Huang,  B.K.  Jenkins  and  A. A.  Sawchuk,  “Binary  Image 
Algebra  and  Optical  Cellular  Logic  Processor  Design”,  submitted  to  Computer  Vision,  Graphics 
and  Image  Processing,  summarizes  some  of  these  concepts  and  their  algebraic  background.  Fol¬ 
lowing  this  paper  is  “Optical  Symbolic  Substitution  and  Pattern  Recognition  Algorithms  Based 
on  Binary  Image  Algebra”,  by  K.S.  Huang,  B.K.  Jenkins  and  A. A.  Sawchuk,  from  the  ICO  Topi¬ 
cal  Meeting  on  Optical  Computing,  Toulon,  France,  1988,  which  contains  additional  information. 


This  paper  is  concerned  with  the  hardware  implementation  of  one  cell  of  a  prototype  digital 
optical  cellular  image  processor  (DOCIP). 
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Abstract 

A  processing  element  of  a  prototype  digital  optical  cellular  image  processor  (DOCIP) 
demonstrate  a  particular  parallel  computing  and  interconnection  architecture. 
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is  implemented  to 
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Summary 

Digital  optical  cellular  image  processor  (DOCEP)  architectures,  DOCIP-array  and  DOCIP-hypercube,  can  perform 
the  tasks  of  parallel  binary  image  processing  and  parallel  binary  arithmetic  [1],  The  use  of  optical  interconnections  per¬ 
mits  a  cellular  hypercube  topobgy  to  be  implemented  without  paying  a  large  penalty  in  chip  area  (the  cellular  hypercube 
interconnections  are  space- in  variant  which  implies  relatively  low  hologram  complexity);  it  also  enables  images  to  be  input 
to  and  output  from  the  machine  in  parallel.  Table  1  gives  a  comparison  of  three  different  interconnection  networks:  cel¬ 
lular  array  (DOCIP-array  interconnection  network),  conventional  hypercube,  and  cellular  hypercube  (DOCIP-hypercube 
interconnection  network).  In  this  paper  we  experimentally  demonstrate  the  concept  of  the  DOCIP  architecture  by  imple¬ 
menting  one  processing  element  of  a  prototype  optical  computer  including  a  49-gate  processor,  an  instruction  decoder, 
and  electronic  input/output  interfaces. 

A  multiple-exposure  multi-facet  interconnection  hologram  provides  the  Axed  interconnections  between  the  outputs 
and  the  inputs  of  an  array  of  7  x  7  optical  gates.  The  input  data  and  the  instructions  are  supplied  from  an  LED  array. 
The  outputs  of  optical  gates  are  detected  by  a  video  camera  and  compared  with  the  results  of  a  software  simulation.  A 
diagram  of  the  main  components  of  this  experimental  system  is  shown  in  Fig.  1. 

A  space-variant  interconnection  system  [2]  for  within-processor  interconnection  is  used  in  this  experimental  demon¬ 
stration.  A  computer  controlled  system  is  used  to  make  an  array  of  49  interconnection  subholograms.  An  optical  point 
source  S,  whose  position  is  controlled  by  the  minor  M2  with  two  rotational  stages  (Fig.  1),  is  used  to  provide  an  object 
beam  for  determining  an  interconnection  of  a  subhologram  in  the  multi-facet  hologram.  A  mask  with  a  circular  aperture, 
controlled  by  two  translational  stages,  is  used  to  determine  the  sizes  and  positions  of  subholograms  in  a  holographic  plate. 
The  interconnection  hologram  for  this  49-gate  optical  processing  element  comprises  49  subholograms,  which  are  laid  out 
in  a  7  x  7  array.  Each  snbhologram  covers  a  circular  area  with  a  diameter  of  l.S  mm  The  spacing  between  the  centers 
of  two  subholograms  is  3.0  mm.  Note  that  the  path  of  the  object  beam  and  the  mask  for  subhoIogTams  are  only  used  for 
making  the  interconnection  hologram;  they  are  blocked  or  moved  when  we  reconstruct  the  hologram  to  implement  the 
interconnections  of  the  optical  gates.  We  use  a  volume  phase  hologram  with  a  dichromated  gelatin  medium  for  obtaining 
high  diffraction  efficiencies. 

The  array  of  7  x  7  optical  gates  is  implemented  by  a  Hughes  liquid-crystal  light  valve  (LCLV)  with  liquid-crystal 
molecules  in  a  45*  twisted  nematic  configuration  [2j.  The  LCLV  is  read  out  between  crossed  polarizers  and  is  biased  to 
implement  a  NOR  operation.  The  gate  size  in  this  experiment  has  a  diameter  of  0.3  mm  and  the  spacing  between  the 
centers  of  two  gates  is  0.6  mm. 

The  circuit  diagram  of  the  processing  element,  as  shown  in  Fig.  2,  consists  of  49  NOR  gates  with  maximum  fan-in 
of  3  and  fan-out  of  4.  The  processing  element  includes  a  3-bit  destination  selector,  a  3-bit  master-slave  flip-flop  memory, 
a  6-bit  memory  selector  with  a  union  module,  and  a  5-bit  neighborhood  selector  (for  DOCIP-array4  [1  ])  with  a  dilation 
module.  This  experimental  DOCIP  system  has  one  instruction,  supplied  from  an  LED  array  and  decoded  by  the  optical 
hardware.  This  instruction  has  the  format:  (c, di,dj, dj.Ji,  Jj,...,ra,ni,n2,...,n5)  where  c  selects  the  image  from  the 
input  or  from  the  feedback;  di.dj,  and  dj  select  the  destination  memory  for  storing  the  image;  Ji , si, ..., s«  select  the 
output  from  the  memory  elements;  and  ni ,  nj, ...,  n5  control  the  neighborhood  mask,  i.e.  supply  the  reference  image.  We 
will  experimentally  demonstrate  the  DOCIP  architecture  concept  with  this  system. 
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Table  1.  A  comparison  between  three  different 
interconnection  networks  of  NxN  processeng 
elements  (PEs):  cellular  array,  conventional 
hypercube  and  cellular  hypercube.  When  laid  out 
on  a  VLSI  chip,  both  the  conventional  hypercube 
and  cellular  hypercube  pay  a  large  penalty  In  chip 
area  while  the  cellular  hypercube  has  a  relatively 
low  hologram  complexity. 


Figure  1.  Experimental  DOCIP  system.  Lens  LI  images 
from  the  LCLV  gate  output  plane  to  the  hologram  plane. 
Beam  Splitter  BS3  combines  the  external  input  signals 
from  LED  array  and  the  feedback  signals  from  intercon¬ 
nection  hologram.  LP1  and  LP2  are  lens-pinhole 
assemblies.  PI  and  P2  are  crossed  polarizers.  The  holo¬ 
gram  comprises  an  array  of  subholograms.  Mirror  M2 
controls  the  position  of  point  source  S  during  hologram 
exposure.  After  the  hologram  is  made,  the  mask  and  all 
components  in  the  path  from  8S1  to  the  hologram  are 
not  needed. 
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Figure  2.  The  circuit  diagram  of  a  49-gate  processing  element  of  the  DOCIP-array4. 


2.6  Parallel  Processing  and  Optical  Computing 


A  third  area  of  research  on  this  grant  has  been  the  general  investigation  of  the  impact  of  opti¬ 
cal  computing  technology  on  parallel  computing  architectures ,  including  consideration  of  SIMD, 
M1MD  and  data  flow  structures.  We  have  studied  the  relationship  of  these  architectures  at  low 
to  high  levels  of  processor  graininess,  the  following  paper  “Parallel  Processing  Paradigms  and 
Optical  Computing”  by  B.K.  Jenkins  and  A. A.  Sawchuk,  which  appeared  in  the  Proc.  Optical 
Computing  Symposium,  SPIE  Vol.  625,  Los  Angeles,  January  1986,  discusses  shared  memory 
and  graph/network  models  for  parallel  computing  in  the  context  of  the  physical  constraints  and 


technology  of  optical  computing. 
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ABSTRACT 

Parallel  processing  models  as  computational  paradigms  are  discussed  and  related  to  optical  computing. 
Two  classes  of  parallel  computing  models  are  discussed  -  shared  memory  models  and  graph/network  models. 
These  models  are  used  to  analyze  some  of  the  possible  effects  of  optical  technology  on  parallel  computing.  It  is 
found  that  the  use  of  optics  potentially  provides  certain  fundamental  advantages.  In  addition,  some  factors  that 
limit  the  communication  capabilities  of  optical  systems  in  the  case  of  network  models  are  found. 

INTRODUCTION 

In  this  paper  we  look  at  paradigms  and  models  for  parallel  processing  as  an  attempt  to  increase  our  under¬ 
standing  of  the  role  optical  computing.  Most  of  the  parallel  architectures  discussed  in  the  parallel  processing 
community  are  heavily  influenced  by  the  constraints  of  electronic  systems.  The  purpose  of  our  approach  in  this 
paper  is  to  abstract  the  notion  of  parallel  computing  from  the  limitations  of  any  given  technology.  This  abstract 
model  can  then  be  used  as  a  starting  point  for  the  design  of  parallel  optical  computing  architectures.  In  the  pro¬ 
cess,  some  of  the  consequences  of  inherent  differences  between  optical  and  electronic  systems  start  to  become 
apparent. 

Computing  paradigms  are  important  for  understanding  the  level  and  class  of  problems  that  the  computer 
scientist  is  addressing.  Consider  the  following  structural  paradigmatic  classification:  physical,  functional,  compu¬ 
tational.  A  representation  and  example  of  each  of  these  paradigms  is  illustrated  in  Table  1.  Here  we  are  only 
concerned  with  the  computational  paradigm  and  the  optical  implications. 

Table  1.  Processing  paradigm  levels 


PARADIGMS 

REPRESENTATION 

EXAMPLE 

Physical 

Hardware/Technology 

IC,  Board 

Functional 

Architecture 

PE,  Memory, 
Interconnection  Topology 

Computational 

Algorith  ms/Metrics 

Turing  Machine, 
Automata,  Random 
Access  Machine 

Before  discussing  computational  models,  both  sequential  and  parallel,  we  define  computational  order  or 
complexity  as  it  is  used  in  this  paper.  The  interest  here  is  in  establishing  a  quantitative  measure  of  the  compu¬ 
tational  power  or  cost  of  a  problem,  task  or  algorithm  of  size  n  .  The  parameter  n  provides  a  measure  of  the 
difficulty  of  the  problem  in  the  sense  that  the  time  required  to  solve  the  problem  or  the  storage  space  required, 
or  both,  will  increase  as  it  grows.  The  measure  or  cost  of  running  or  executing  an  algorithm  on  a  problem  of  size 
n  is  defined  as  the  complexity  function  /  .  Thus,  /  is  a  measure  of  the  time  or  space  required  for  the  execu¬ 
tion  of  the  algorithm.  For  a  time  measure,  /  (n  )  is  called  the  time  complexity  function;  for  a  space  or  storage 
measure,  /  (n  )  the  space  complexity  function.  Unless  otherwise  denoted  the  complexity  function  used  in  the 
paper  is  the  time  complexity  function. 

Our  principal  concern  is  with  the  performance  of  algorithms  for  large  valves  of  n  ,  i.e.  the  asymptotic 
behavior  of  complexity  function.  If  the  value  of  n  is  sufficiently  small,  then  even  inefficient  algorithms  will  cost 
the  same  to  run.  We  assume  the  choice  of  an  algorithm  for  small  problems  is  not  usually  critical.  The  asymp¬ 
totic  behavior  of  is  defined  as  0(f  ),  the  order  of  /  .  We  will  not  give  a  formal  definition  of  O(f)  but  illus¬ 
trate  its  properties  in  Table  2.  For  a  formal  definition  and  more  extensive  discussion  of  these  concepts  see 
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Stanat  and  McAllister  (1977).  Table  3  illustrates  the  growth  of  certain  complexity  functions  as  a  function  of  the 
size  of  the  problem  (after  Stanat  and  McAllister).  As  one  can  see,  a  problem  can  get  out  of  control  rather 
quickly  for  certain  orders  of  complexity. 

Table  2.  Examples  of  computation  complexity  and  order. 
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Table  3.  Growth  of  some  common  complexity  functions.  The  entries  are  proportional  to  the  time  required  to 
solve  a  problem  of  size  n. 
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Computational  models  are  important  because  they  measure  the  performance  of  general  classes  of  both 
sequential  and  parallel  algorithms  on  an  idealized  abstract  machine.  However,  the  performance  of  these  models 
is  highly  dependent  on  the  class  of  algorithms.  If  the  generic  class  of  algorithms  is  known  for  a  specific  problem 
(e  g.  the  communication  algorithms  of  broadcasting,  reporting,  sorting,  etc.),  then  the  computational  model 
which  efficiently  runs  these  algorithms  would  be  a  starting  point  for  the  design  of  a  computer  architecture  that 
would  do  the  same.  The  basic  assumption  is  that  algorithms  which  run  well  on  a  computational  model  should 
run  well  on  the  model-derived  architecture.  Our  intention  is  to  show  that  optics  has  a  greater  potential  than 
electronics  for  physically  realizing  some  of  these  computational  models. 

SEQUENTIAL  COMPUTATIONAL  MODELS 

Since  parallel  computational  models  are  for  the  most  part  extensions  of  sequential  models,  we  briefly  dis¬ 
cuss  these  sequential  machines.  The  most  primitive  and  basic  cf  the  sequential  machines  is  a  Turing  machine 
(TM)  of  which  there  exist  many  forms:  universal,  non-deterministic,  multi-tape,  multi-head,  2-D  tape,  finite  state 
automata,  etc.  (A  finite  state  automation  (FSA)  can  be  described  by  a  TM  in  which  the  tape  moves  in  only  one 
direction).  The  universal  TM  has  the  capability  of  computing  any  algorithm  that  is  computable  (a  rather  circu¬ 
lar  thesis  since  a  universal  TM  defines  what  is  computable).  A  principal  application  of  the  TM  is  in  determining 
lower  bounds  on  the  space  or  time  necessary  to  solve  algorithmic  problems.  Since  the  TM  is  a  well-known  com¬ 
putational  model,  we  highly  recommend  for  further  interest  the  very  informative  text  by  Minsky  (1967). 

The  Random  Access  Machine  (RAM)  is  a  less  primitive  computational  model  which  can  be  stylized  as  a 
primitive  computer.  The  RAM  model  is  a  one-accumulator  computer  in  which  the  instructions  are  not  allowed  to 
modify  themselves.  Figure  1  illustrates  a  RAM  which  consists  only  of  a  read-only  input  tape,  a  write-only  out¬ 
put  tape,  a  program  and  a  memory  (Aho,  Hopcroft,  and  Ullman,  1974).  Notice  the  close  similarity  to  a  TM.  In 
fact  time  on  the  RAM  is  bounded  above  by  a  polynomial  function  of  time  on  the  TM.  In  particular,  for  a  TM  of 
lime  complexity  T(n  )>n  ,  a  RAM  can  simulate  the  TM  in  0  (T(n  ))  or  0  (T(n  )logn  )  time,  depending  on  the 
cost  function  used  for  the  RAM.  For  the  converse,  using  a  TM  to  simulate  a  RAM,  the  bounds  on  time  required 
by  the  TM  are  higher  and  are  highly  dependent  on  the  RAM  cost  function  used  (Aho,  Hopcroft,  and  Ullman, 
1974).  The  program  of  a  RAM  is  not  stored  i"  memory  and  is  unmodifiabte.  A  sample  RAM  instruction  set  is 
shown  in  Table  4.  A  common  RAM  model  is  the  uniform  cost  one,  which  assumes  that  each  RAM  instruction 
requires  one  unit  of  time  and  each  register  one  unit  of  space.  It  is  from  attempts  to  parallelize  the  RAM  compu¬ 
tational  model  that  many  parallel  computational  models  emerged. 


Table  4.  Sample  RAM  instruction  set. 
JGTZ  is  jump  if  greater  than  zero. 
JZERO  is  jump  if  equal  to  zero. 
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Fig.  1.  Random  access  machine  (RAM) 
SHARED  MEMORY  MODELS 

We  will  discuss  only  two  classes  of  parallel  computational  models;  shared-memory  models  and 
graph/network  models.  As  might  be  inferred  from  the  shared  memory  term,  these  models  are  based  on  global 
memories  and  are  differentiated  by  their  accessibility  to  memory.  In  Fig.  2  we  see  a  typical  shared  memory 
model  where  individual  processing  elements  (PE’s)  have  variable  simultaneous  access  to  an  individual  memory 
cell.  (A  processing  element  is  a  physically  isolated  computational  unit  consisting  of  some  local  memory  and  com¬ 
putational  power.  A  PE  can  be  construed  as  a  computational  primitive  from  which  more  sophisticated  architec¬ 
tures  can  be  constructed  (Hwang  and  Briggs,  1984)).  Each  PE  can  access  any  cell  of  the  global  memory  in  unit 
time.  In  addition,  many  PE’s  can  access  many  different  cells  of  the  globed  memory  simultaneously.  In  the 
models  we  discuss,  each  PE  is  a  slightly  modified  RAM  without  the  input  and  output  tapes,  and  with  a  modified 
instruction  set  to  permit  access  to  the  global  memory.  A  separate  input  for  the  machine  is  provided.  A  given 
processor  can  generally  not  access  the  local  memory  of  other  processors. 


showing  multiple  optical  beams  providing 

Fig.  2.  Conceptual  diagram  of  shared  memory  models.  contention-free  read  access. 

The  models  differ  primarily  in  whether  they  allow  simultaneous  reads  and/or  writes  to  the  game  memory 
cell.  The  PRAC,  parallel  random  access  computer  (Lev,  Pippenger  and  Valiant,  1981)  does  not  allow  simultane¬ 
ous  reading  or  writing  to  an  individual  memory  cell.  The  PRAM,  parallel  random  access  machine,  (Fortune  and 
Wyllie,  1978)  permits  simultaneous  reads  but  not  simultaneous  writes  to  an  individual  memory  cell.  The 
WRAM,  parallel  write  random  access  machine,  denotes  a  variety  of  models  that  permit  simultaneous  reads  and 
certain  writes,  but  differ  in  how  the  write  conflicts  are  resolved.  For  example,  a  model  by  Shiloach  and  Vishkin 
(1981)  allows  a  simultaneous  write  only  if  all  processors  are  trying  to  write  the  same  value.  The  paracomputer 
(Schwartz,  1980)  has  simultaneous  writes  but  only  “some”  of  all  the  information  written  to  the  cell  is  recorded. 
The  models  represent  a  hierarchy  of  time  complexity  given  by 


^pPRAC^,  'pPRAM'>>  y*WRAM 

where  T  is  the  minimum  number  of  parallel  time  steps  required  to  execute  an  algorithm  on  each  model.  More 
detailed  comparisons  are  dependent  on  the  algorithm  (Borodin  and  Hopcroft,  1985). 

Implications  of  optics 

In  general,  none  of  these  shared  memory  are  physically  realizable  because  of  actual  fan-in  limitations. 
Optical  interconnections  permit  greater  fan-in  than  electronic  systems.  In  addition,  the  non-interacting  property 
of  photons  in  a  linear  medium  (versus  the  mutual  interaction  of  electrons)  may  permit  simultaneous  memory 
reads  much  more  easily.  As  an  electronic  example,  the  ultracomputer  (Schwartz,  1980)  is  an  architectural  man¬ 
ifestation  of  the  paracomputer  that  uses  a  hardwired  Omega  network  between  the  PE’s  and  memories;  it  simu¬ 
lates  the  paracomputer  within  a  time  penalty  of  O  (logn  ). 

Optical  systems  could  in  principle  be  used  to  implement  this  parallel  memory  read  capability.  As  a  simple 
example,  a  single  1-bit  memory  cell  can  be  represented  by  one  pixel  of  an  array;  the  bit  could  be  represented  by 
the  state  (opaque  or  transparent)  of  the  memory  cell.  Many  optical  beams  could  simultaneously  read  the  con¬ 
tents  of  this  memory  cell  without  contention  (Fig.  3).  In  addition  to  this  an  interconnection  network  is  needed 
between  the  PE’s  and  the  memory,  that  can  allow  any  PE  to  communicate  with  any  memory  cell,  preferably  in 
one  step,  and  with  no  contention.  A  crossbar  is  not  sufficient  for  this  because  fan-in  to  a  given  memory  cell 
must  be  allowed.  Optical  systems  can  potentially  implement  crossbars  that  also  allow  this  fan-in.  For  example, 
some  of  the  optical  crossbar  designs  discussed  in  Sawchuk  and  Jenkins  (1986)  can  include  fan-in  capability. 


GRAPH/NETWORK  MODELS 

Graph/network  models  are  characterized  by  a  collection  of  usually  identical  PE’s  that  are  interconnected 
with  a  fixed  network.  They  can  be  represented  by  graphs,  with  a  node  of  the  graph  for  each  PE  and  an  arc  or 
link  of  the  graph  for  each  PE  to  PE  interconnection.  The  models  differ  from  one  another  in  the  length  of  time 
required  for  a  message  to  traverse  one  arc  of  the  graph,  and  on  the  assumptions  placed  on  the  PE’s  such  as  their 
ability  to  respond  to  multiple  messages.  The  feasibility  of  implementation  of  these  models  depends  on  the  con¬ 
nectivity  of  the  graph;  if  the  connectivity  is  not  too  high,  the  model  is  much  more  readily  implemented  than  the 
shared  memory  models. 

Network  models  can  be  compared  to  shared  memory  models.  Any  of  the  shared  memory  models  can 
efficiently  simulate  (in  O  (1)  time)  a  network  model.  This  is  done  by  dedicating  a  different  cell  of  the  global 
memory  for  each  link  of  the  network.  One  PE  sends  a  message  to  another  by  writing  the  message  to  a  memory 
cell  which  the  other  PE  then  reads.  Conversely,  suppose  the  network  model  is  capable  of  (partial)  routing  in 
r  (n  )  time.  Then  it  can  simulate  one  step  of  the  PRAC,  PRAM,  or  WRAM  in  O  (r  (n  ))  time  (Borodin  and  Hop¬ 
croft,  1985). 

In  a  highly  parallel  machine  communications  are  exceedingly  important  and  for  many  tasks  can  dominate 
the  execution  time  of  the  algorithm.  We  therefore  concentrate  on  communications  in  our  analysis  of  these 
models.  The  effectiveness  of  different  PE  network  topologies  can  be  evaluated  by  comparing  metrics,  essentially 
measures  of  the  topological  characteristics,  or  by  comparing  the  number  of  time  steps  required  to  complete  vari¬ 
ous  fundamental  communication  tasks  or  algorithms.  Examples  of  metrics  include  diameter,  the  shortest  dis¬ 
tance  between  the  two  most  separated  nodes  where  distance  is  measured  in  terms  of  number  of  links,  and 
bandwidth,  the  maximum  number  of  messages  that  can  be  simultaneously  sent  over  the  network  in  one  time 
step.  Levitan  (1985)  compared  different  architectures  based  on  network  models  using  both  metrics  and  commun¬ 
ication  tasks,  and  concluded  that  communication  tasks  are  a  better  predictor  of  actual  run  time  performance  on 
a  given  topology  than  are  metrics.  In  our  analysis  we  will  use  communication  tasks. 

PE  Complexity  and  Communications 

Since  the  performance  of  network  models  depends  on  the  assumptions  on  the  individual  processing  ele¬ 
ments,  we  need  to  consider  these  assumptions  and  their  relationships  to  communication  tasks.  We  will  show 
that  in  general  the  communications  between  PE’s  (or  the  network  topology)  cannot  be  completely  decoupled 
from  the  hardware  complexity  of  the  PE’s  themselves.  After  giving  a  relationship  between  PE  space  complexity 
and  interconnection  capability,  we  will  be  able  to  identify  what  reasonable  assumptions  on  the  PE  complexity 
are  for  the  optics  and  electronics  cases.  These  assumptions  will  be  used  in  assessing  the  performance  of  different 
communication  tasks  on  network  models.  In  this  paper  the  term  PE  complexity  refers  to  the  space  complexity 
of  each  PE.  We  will  not  discuss  time  complexity  of  individual  PE’s. 


For  simplicity,  we  will  assume  the  bandwidth  of  each  I/O  line  to  a  PE  is  fixed  and  is  given.  Thus  we  are  a 
priori  not  considering  one  of  the  potential  advantages  of  an  optical  system  over  an  electronic  one.  We  will  how¬ 
ever,  consider  the  effect  of  the  number  of  I/O  lines  to  a  PE.  In  the  case  of  input  lines,  the  signals  coming  in 
may  be  immediately  combined,  or  may  be  kept  separate  and  stored  into  separate  registers.  Consider  the  former 
case.  Examples  include  forming  the  sum  of  all  simultaneous  inputs  or  just  forming  the  logical  AND  over  all  of 
them.  In  the  simplest  case  of  a  logical  operation  over  all  inputs,  the  PE  must  accommodate  the  required  fan-in 
To  do  this  with  gates  of  a  fixed  size  (fixed  fan-in  per  gate)  requires  O  (/,• )  gates  for  /,  input  lines.  Thus  the  PE 
complexity  grows  0  (/, )  merely  to  accommodate  the  input  lines.  If  the  input  gates  are  allowed  to  have  a  fan-in 
that  increases  with  /,  ,  then  the  PE  complexity  still  grows  with  /,  because  the  complexity  of  the  input  gates 
(instead  of  the  number  of  input  gates)  grows  0(1,).  In  the  case  of  the  PE  keeping  the  input  signals  separate, 
0  (/, )  gates  are  needed  for  the  input,  and  if  stored  into  memory  then  /,•  memory  cells  are  required.  Thus  the  PE 
complexity  must  be  at  least  0  (/, );  if  the  PE  can  arbitrarily  rearrange  the  signals  in  a  small  number  of  time 
steps  then  the  PE  complexity  grows  even  faster  (e.g.,  0  (/>)  if  a  crossbar  is  used).  Similar  arguments  can  be 
applied  to  the  case  of  PE  output  lines.  Thus  a  PE  with  /,•  input  lines  and  /„  output  lines  has  complexity  that 
grows  0  (/,  +1, )  in  the  simpler  cases;  if  too  many  demands  are  placed  on  its  ability  to  process  or  move  these  sig¬ 
nals  around,  then  its  complexity  grows  faster. 

Implications  of  this  lie  in  the  communication  ability  of  PE  networks,  particularly  in  the  optics  case.  With 
electronic  technology,  the  number  of  I/O  lines  to  a  PE  is  generally  quite  limited  and  this  limits  the  ability  of  the 
PE’s  to  communicate.  This  is  due  to  limited  pinout,  cost  of  interconnections,  etc.  The  PE  complexity  is  in 
practice  not  an  issue  for  communications.  In  the  optics  case,  however,  there  are  no  pinout  restrictions  and  many 
parallel  interconnection  lines  are  feasible.  However,  there  are  limitations  on  the  total  number  of  interconnec¬ 
tions  in  an  optical  PE  network;  these  are  due  to  the  PE  complexity  itself.  In  other  words,  the  PE’s  have  to  be 
able  to  accommodate  all  of  the  I/O  lines.  The  optics  case  apparently  allows  a  balance  between  the  interconnec¬ 
tions  and  the  PE  complexity;  in  the  electronics  case  the  interconnections  are  further  limited  by  technology  fac¬ 
tors. 

Consider  a  fully  connected  array,  that  is  one  in  which  every  PE  has  a  hardwired  line  to  every  other  PE. 
For  a  network  of  JV  PE’s,  the  complexity  of  each  PE  grows  O(N)  because  it  has  N  I/O  lines;  therefore  the 
total  complexity  of  all  the  PE’s  is  O  (/V2).  In  this  case  the  total  number  of  interconnection  lines  is  also  O  (N2). 
The  total  complexity  of  the  PE  network  is  0  (TV2). 

We  consider  three  specific  examples  of  PE  complexity  in  the  case  of  a  fully  connected  array. 

Example  1.  The  PE’s  are  made  up  of  binary  gates,  either  optical  or  electronic.  This  example  was 
included  in  the  discussion  above;  each  PE  has  O  (N)  gates  and  has  complexity  that  grows  O  (N)  in  the  simplest 
cases. 


Fig.  4.  Optical  inner  product  matrix-vector  multiplier  as  an  example  of  a  fully  connected  array. 

Example  2.  Consider  the  optical  matrix-vector  multiplier  of  Fig.  4.  This  is  also  a  fully  connected  array, 
because  in  general  every  input  or  source  is  connected  to  every  output  or  detector.  Thus  we  expect  each  “PE”  to 
have  complexity  at  least  O(N).  Each  PE  can  be  viewed  as  a  detector,  any  thresholding,  A/D,  or  processing 
electronics,  and  a  source  (which  is  part  of  the  same  PE  if  feedback  is  included).  The  2-D  SLM  or  mask  is  con¬ 
sidered  part  of  the  interconnections  in  this  case.  Even  for  the  simple  case  of  binary  sources  and  mask  transmit- 
tances,  each  detector  must  distinguish  0  (N )  levels.  In  addition,  there  are  accuracy  requirements  on  the  source 
intensities  for  these  levels  to  be  distinguishable.  We  conjecture  that  the  PE  complexity  must  increase  at  least 
O  (A/).  This  is  clear  if  we  assume  that  the  ability  to  distinguish  an  analog  signal  to  within  1  part  in  jV  implies 
a  complexity  of  0  (N).  Thus  the  total  complexity  of  all  PE’s  in  the  network  must  grow  at  least  O  (N2). 


Example  3.  Consider  again  the  optical  matrix-vector  multiplier,  but  now  define  each  PE  to  include  the 
multiplies  also.  Each  PE  is  physically  distributed,  and  includes  N  mask  pixels  (shown),  1  detector  (shown),  and 
1  source.  There  are  N  PE’s;  each  PE  performs  N  multiplies  and  N  adds,  and  has  N  spatially  separate  input 
lines  (at  the  2-D  mask).  The  network  is  still  fully  connected,  as  each  source  (PE  output)  fans  out  to  all  PE’s.  In 
this  case,  the  accuracy  requirements  on  the  sources,  detectors  and  ensuing  electronics  are  the  same  as  in  the  pre¬ 
vious  example.  In  addition  to  this  the  accuracy  of  each  multiply  (the  pixel  transmittance)  must  be  taken  into 
account;  one  PE  contains  N  of  these  multiplies  and  the  accuracy  (and  therefore  the  complexity)  of  each  multi¬ 
ply  must  increase  with  N .  Thus  the  complexity  of  each  PE  must  increase  faster  than  0  (N );  the  complexity  of 
the  network  grows  faster  than  O  (IV2).  This  is  reflected  in  the  hardware  requirements  in  making  a  large  fully 
connected  (optical)  system. 

Communication  Tasks  on  Network  Models 

In  this  section  we  will  give  the  time  required  to  execute  different  communication  tasks  on  different  network 
topologies.  We  are  concerned  with  fine-grained  systems,  that  is  systems  with  a  large  or  very  large  number  of 
relatively  simple  PE’s.  As  a  minimum,  we  assume  each  PE  can  store  its  own  address  so  that  it  knows  where  it  is 
located.  Many  algorithms  can  become  quite  difficult  without  this  feature.  This  implies  that  the  PE  complexity 
must  be  allowed  to  grow  O  (logjV). 

In  an  electronic  system,  the  number  and  length  of  interconnections  is  important  and  ideally  should  be 
minimized.  The  number  of  connections  to  each  PE  or  node  of  the  graph  is  limited  to  small  values  due  to  I/O 
constraints.  This  limits  the  connectivity  of  graphs  that  can  be  efficiently  implemented.  The  degree  of  a  graph  is 
the  number  lines  connected  to  each  node.  Electronic  systems  limit  the  degree  of  the  graph  to  a  relatively  small 
value;  for  large  enough  N  the  degree  must  be  a  constant,  independent  of  N  . 

Optical  systems  have  no  I/O  restrictions  on  the  PE’s  per  se,  but  as  discussed  above  the  degree  of  the  graph 
will  be  limited  by  the  complexity  of  the  PE's.  Since  the  PE  complexity  must  be  at  least  O  (log N )  anyway,  in 
the  optics  case  the  degree  of  the  graph  can  easily  be  O  (log N).  Larger  degrees,  e.g.  O  {N')r  ),  where  p  >2  may 
also  be  feasible. 

The  time  required  for  different  interprocessor  communication  tasks  performed  on  different  fixed  networks 
has  been  studied  by  Kushner  and  Rosenfeld  (1983)  and  Levitan  (1985).  They  show  substantially  reduced  compu¬ 
tation  time  for  many  communication  tasks  on  networks  of  larger  degree  (e.g.  hypercubes),  as  compared  to 
simpler  networks  such  as  arrays.  Augmented  trees  (Uhr,  1983)  and  bushy  trees  may  also  permit  reduced  time 
for  these  tasks.  Examples  of  bushy  trees  are  trees  of  degree  1+m  ,  which  have  diameter  2p  ,  where  m  is  the 

number  of  leaf  nodes  and  is  a  power  of  p  . 

In  order  to  calculate  communication  times  on  a  network  model,  certain  assumptions  need  to  be  specified. 
We  assume  that  all  messages  are  the  same  size  and  are  routed  to  their  destinations  over  the  fixed  connection  net¬ 
work  by  passing  over  links  and  through  PE’s.  One  time  step  is  defined  as  the  time  for  a  PE  to  send  a  message, 
the  message  to  travel  over  one  link,  be  received  by  the  PE  at  the  end  of  the  link,  and  for  the  PE  to  perform  any 
computation  on  the  message  (such  as  altering  its  tag  or  combining  messages  that  arrive  simultaneously).  The 
processors  operate  synchronously.  Finally,  the  number  of  messages  that  can  simultaneously  be  accepted  or  out- 
put  by  each  PE  must  be  considered.  In  the  electronics  case,  the  number  of  messages  that  can  be  simultaneously 
accepted  by  a  PE  is  relatively  small  (because  of  the  degree  limitation),  and  will  probably  need  to  be  a  constant 
independent  of  N  (Kushner  and  Rosenfeld,  1983).  For  simplicity  this  can  be  taken  to  be  1.  A  PE  can  output 
identical  copies  of  the  same  message,  but  not  multiple  messages.  For  the  optics  case,  we  assume  only  a  limit  on 
the  PE  complexity;  this  then  dictates  how  flexible  the  inputs  and  outputs  of  the  PE  can  be.  We  limit  the  PE 
complexity  to  the  degree  of  the  network  or  log/V ,  whichever  is  greater.  Each  PE  can  accept  d  simultaneous 
messages,  where  d  is  the  degree  of  the  network,  and  may  increase  with  N .  Each  PE  can  output  d  identical 
messages  simultaneously;  outputting  different  messages  simultaneously  (in  conjunction  with  inputing  several 
messages  simultaneously)  can  involve  an  increase  in  PE  complexity,  depending  on  what  the  PE  is  required  to  do. 

Kushner  and  Rosenfeld  (1983)  classify  communication  tasks  as  one-to-many,  many-to-one,  and  one-to-one. 
One  to  many  tasks  include  broadcasting,  in  which  one  PE  (the  root  if  there  is  a  node  so  distinguished)  sends  the 
same  message  to  many  other  PE’s,  in  the  worst  case  to  all  other  PE’s.  In  the  more  general  one-to-one  case  the 
messages  may  be  altered  as  they  travel,  e.g.  each  message  could  have  a  value  that  is  incremented  by  one  each 
time  it  passes  through  a  PE;  thus  it  keeps  track  of  the  distance  it  has  traveled.  Broadcasting  must  take  time  at 
least  as  long  as  the  distance  to  the  farthest  node  to  be  reached. 

Many  to  one  tasks  must  be  divided  into  two  classes.  In  both  classes  many  PE’s  all  send  messages  to  the 
same  PE  (root).  In  one  ease,  condensing,  the  messages  can  be  combined  (e.g.  added)  in  route  to  the  destination. 
An  example  of  this  would  be  for  computing  the  area  of  a  region  -  each  PE  in  the  region  sends  a  1,  and  the  sum 
of  all  messages  is  equal  to  the  area.  This  is  essentially  the  inverse  of  broadcasting,  and  the  time  is  again  limited 


by  the  farthest  distance  to  be  traveled.  In  the  case  of  funneling,  the  messages  must  be  kept  separate.  This  in 
general  takes  much  longer.  If  all  N  PE’s  send  messages  to  be  funneled,  then  the  time  is  bounded  below  bv 
N /d  ,  where  d  is  the  network  degree,  because  of  the  “bottleneck”  at  the  destination  node.  Whether  this  lower 
bound  is  achieved  depends  on  the  network  topology. 

One  to  one  tasks  are  permutations,  in  which  each  PE  sends  a  message  to  one  other  PE.  In  the  worst  case, 
half  the  PE’s  send  messages  to  the  other  half,  each  message  with  a  different  destination  PE.  This  of  course  must 
take  time  at  least  equal  to  the  farthest  distance  to  be  traveled  (the  diameter  of  the  network  for  the  worst  case). 
In  general,  bottlenecks  will  cause  the  time  to  be  larger  than  the  network  diameter;  actual  time  again  depends  on 
the  topology. 

Table  5.  Order  of  magnitude  time  for  communication  tasks  on  a  fixed  interconnection  PE  network  with  N  PE’s. 
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(1)  From  Kushner  and  Rosenfeld  (1983). 

(2)  b  =  branching  factor  of  tree  =  m1^,  where  m  =  no.  of  leaf  nodes,  p  —  radius  =  no.  of  levels  -  1. 
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Fig.  5.  Examples  of  network  topologies. 

The  worst  case  order  of  magnitude  communication  time  for  several  networks  of  different  topologies  and 
degrees  is  given  in  Table  5.  The  array  and  binary  tree  take  the  same  time  under  our  optics  and  electronics 
assumptions.  The  optics  assumptions  allow  degrees  that  are  a  function  of  N ,  and  further  reduce  the  time  for 
funnelling  (from  time  =N )  in  some  cases.  Examples  of  nearest  neighbor  array,  tree,  hypercube,  and  fully  con¬ 
nected  networks  are  shown  in  Fig.  5. 


CONCLUSIONS 

We  have  studied  abstract  models  of  parallel  machines  at  the  computational  paradigm  level.  By  attempting 
to  abstract  out  the  limitations  of  electronic  systems,  we  have  found  some  potential  advantages  of  optical  com¬ 
puting  systems  in  contention-free  parallel  read  access  to  global  memories,  associated  reconfigurable  interconnec¬ 
tion  networks,  and  in  implementing  PE  networks  of  increased  degree  over  electronics.  We  have  also  pointed 
out  that  the  connectivity  of  even  optical  systems  is  not  unlimited;  it  is  limited  by  the  complexity  of  the  com¬ 
ponents  that  are  being  connected. 
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2.7  Acousto-optic  Signal  Processing 


A  final  area  of  study  has  been  in  high  speed  acousto-optic  systems  for  matrix-matrix  multi¬ 
plication.  The  attached  paper  “Acousto-Optic  Matrix-Matrix  Multiplier”  by  D.S.  Kalivas,  G. 
Albanese  and  A. A.  Sawchuk  in  Optics  Letters,  Vol.  13,  pp.  291-293,  April  1988  summarizes 


these  results. 


Reprinted  from  Optics  Letters,  Voi.  13,  page  291,  April  1988. 
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A  new  architecture  for  an  optical  matrix-matrix  multiplier  is  presented.  It  is  based  on  the  beam-modulation  and 
beam -deflection  properties  of  Bragg  acousto-optic  cells.  Its  parallel  structure  makes  it  very  fast.  Some  physical 
limitations  are  discussed. 


There  exist  many  architectures  for  optical  matrix  alge¬ 
braic  processors.1  Recently,  several  architectures 
that  use  acousto-optic  cells  have  been  proposed.2-3  An 
interesting  frequency-multiplexed  pipelined  matrix- 
vector  processor  was  presented  by  Casasent  et  al.4 
This  system  is  a  systolic  processor  because  the  inputs 
enter  in  a  clocked  time-sequential  pipeline  manner. 
It  is  a  beam-deflector-based  processor,  which  can  also 
be  used  for  matrix-matrix  multiplication.  In  this 
Letter  we  present  a  matrix-matrix  multiplier  that 
makes  use  of  both  the  beam -deflection  and  the  beam- 
modulation  properties  of  acousto-optic  (AO)  cells. 
We  describe  the  architecture  of  the  processor  and  ex¬ 
plain  how  it  works  and  consider  some  basic  features 
and  limitations  from  a  quantitative  point  of  view. 

Bragg  cells  have  two  basic  properties  that  can  be 
utilized  in  the  design  of  an  AO  algebraic  processor.5-6 
The  first  is  the  modulation  of  the  intensity  of  a  light 
beam,  which  is  obtained  by  modulation  of  the  ampli¬ 
tude  of  the  acoustic  wave.  The  second  property  is  the 
deflection  of  light  beams  in  different  directions  caused 
by  frequency  modulation  of  the  acoustic  wave.  Our 
processor  exploits  both  properties. 

The  architecture  is  shown  schematically  in  Fig.  1, 
and  a  top  view  is  shown  in  Fig.  2.  It  multiplies  two 
matrices  A  and  B  of  dimension  N  X  N.  Various  stops 
and  unwanted  diffraction  orders  are  omitted  for  clar¬ 
ity.  At  the  left  is  AO  cell  array  A,  composed  of  N 


independent  AO  cells.  The  direction  of  propagation 
of  the  acoustic  waves  in  each  AO  cell  is  oriented  verti¬ 
cally.  The  N  cells  are  arranged  side  by  side  horizon¬ 
tally.  The  illumination  on  AO  cell  array  A  shown  by 
the  arrows  at  the  left  of  Fig.  lisa  plane  monochromat¬ 
ic  wave  of  constant  complex  amplitude.  The  plane 
wave  front  is  parallel  to  the  left  of  the  AO  cell  array  A, 
and  its  aperture  is  large  enough  to  illuminate  array  A 
completely.  Each  row  of  the  matrix  A  drives  in  paral¬ 
lel  each  cell  of  AO  cell  array  A  (plane  A).  The  single 
long  AO  cell  B  (plane  B)  in  Fig.  1  is  oriented  vertically. 
It  is  shown  artificially  divided  into  N  levels  in  Fig.  1  for 
the  purpose  of  explaining  the  operation  of  the  proces¬ 
sor.  The  AO  cell  B  is  driven  by  a  vector  b  generated 
by  row  scanning7  the  matrix  B.  Let  us  call  bu  (i  =  1, 
2, ....  N\j  =  1, 2, ....  AO  the  elements  of  the  matrix  B. 

Then  b  is  equal  to  the  vector  (feu,  feI2 . blN,b2 . . 

bug, ....  bsu  ■  ■  ■ ,  bsN )',  where  t  denotes  the  matrix 
transpose.  In  the  illustrations  of  Figs.  1  and  2,  both 
lenses  have  the  same  focal  length  //,,  and  all  five  com¬ 
ponents  of  the  system  are  spaced  at  the  same  distance 
fi.  The  lens  located  between  planes  A  and  B  brings 
the  light  from  plane  A  to  a  line  focus  in  plane  B  (Fig. 
2),  and  the  AO  cells  in  plane  A  provide  a  vertical 
deflection  and  amplitude  modulation.  At  the  right  is 
an  instantaneous  detector  array  C  having  N  X  N  ele¬ 
ments,  which  gives  the  output  matrix  C  =  A  X  B. 

In  each  AO  cell  of  the  array  A  a  transducer  launches 


plane  A  plane  B 


0146-9592/88/04029 1  -03S2.00/0 


©  1988,  Optical  Society  of  America 


OPTICS  I.KTTKRS  /  Vol.  13.  No  -1  /  April  1*188 


AO  cell  orroy  A  lens  AO  cell  0  lens  detector  orroy  C 


r\ 


a  multifrequency  acoustic  wave.  Let  us  consider  the 
ith  AO  cell,  which  is  fed  with  the  j'th  row  of  the  matrix 

A.  Its  input  has  N  frequency  components.  The  jth 

component  has  amplitude  a,j  and  frequency  fr  Thus 
the  beam  incident  upon  this  cell  is  split  into  N  beams. 
These  beams  have  intensities  proportional  to  the  ma¬ 
trix  elements  a,;  (/  =  1,  2 . N)  and  are  deflected  at 

different  angles  0,,  which  are  proportional  to  the  fre¬ 
quencies  fj.  The  frequency  fj  is  such  that  the  de¬ 
flected  beam  is  directed  to  the  jth  level  of  the  AO  cell 

B. 

At  this  jth  level  of  AO  cell  B  there  are  N  incident 
beams  from  each  of  the  N  AO  cells  of  array  A.  Syn¬ 
chronized  with  these  beams,  AO  cell  B  contains  a 
propagating  multifrequency  acoustic  wave  with  am¬ 
plitudes  bjk  (k  =  1, 2, . . . ,  N)  and  frequencies  (k  =  1, 

2. . ...  AT).  Each  one  of  these  incident  beams  is  split 
into  N  beams  having  intensities  proportional  to  al;bjh 
(k  =  1, 2, . . . ,  AO  and  is  deflected  at  different  angles  0/* 
determined  by  the  frequencies  /)*  (k  =  l,  2, ,  N). 
The  frequency  f,h  is  such  that  the  corresponding  de¬ 
flected  beam  is  directed  vertically  from  the  j th  level  of 
AO  cell  B  toward  the  feth  row  of  the  detector  array. 
The  horizontal  angular  offset  of  this  beam  is  converted 
by  the  second  lens  to  a  spatial  location  in  the  ith 
column.  Thus  the  combination  of  these  two  deflec¬ 
tions  directs  the  beam  to  the  (i,  fe)th  element  of  the 
detector  array.  At  the  same  element  (i,  k)  all  the 
beams  having  intensities  proportional  to  atJbjk  (j  =  1, 

2.. ..,N)  are  summed  and  detected.  The  result  is 
proportional  to  the  element  c,*  of  the  matrix  C. 

We  denote  by  T  the  time  it  takes  for  the  elements  of 
matrix  B  to  enter  AO  cell  B.  After  matrix  B  enters  the 
system,  the  matrix-matrix  multiplication  is  done  in¬ 
stantaneously;  thus  the  total  processing  time  is  equal 
to  T.  The  equivalent  systolic  matrix-matrix  proces¬ 
sor,  presented  by  Casasent  et  al .?  has  a  total  process¬ 
ing  time  equal  to  2 T. 

The  processing  time  can  be  drastically  reduced  if  an 
array  of  AO  cells  as  shown  in  Fig.  3  is  substituted  for 
the  single  long  AO  cell  B.  AO  cell  array  B  is  composed 
of  N  independent  AO  cells  whose  direction  of  propaga¬ 
tion  is  oriented  vertically,  and  these  N  cells  are 
stacked  end  to  end  vertically.  Each  AO  cell  of  array  B 
is  driven  in  parallel  by  the  corresponding  row  of  the 
matrix  B,  thus  eliminating  the  row  scanning  needed  to 
input  the  entire  matrix  B  as  before.  For  example,  the 


jth  AO  cell  is  driven  by  the  jth  row  of  the  B  matrix 
(bj i, ....  bjk, ....  bjN).  The  new  total  processing  time 
is  equal  to  T/N.  This  time  reduction  is  achieved  at 
the  cost  of  increased  architectural  complexity. 

The  AO  cells  described  above  serve  as  light  modula¬ 
tors  and  beam  deflectors.  The  equations  that  deter¬ 
mine  these  two  operations  are6 

r,  =  sin2  (*L  V  (1) 

y  ^2Xcos  6bJ 

where  rj  is  the  diffraction  efficiency,  L  is  the  length  of 
interaction  between  light  and  sound  (i.e.,  the  thick¬ 
ness  of  the  AO  cell  along  the  direction  of  light  propaga¬ 
tion),  X  is  the  wavelength  of  light  in  free  space,  Ia  is  the 
acoustic  intensity,  M  is  a  constant  having  dimensions 
of  square  meters  per  watt  defined  by  the  AO  cell  mate¬ 
rial,  and  8b  is  the  Bragg  angle  given  by 


Here  fo  is  the  acoustic  frequency,  u  is  the  phase  veloci¬ 
ty  of  the  sound,  and  n  is  the  optical  index  of  refraction 
of  the  AO  cell  material.  Assuming  small  acoustic 
power  and  a  small  Bragg  angle,  we  can  further  simplify 
the  above  equations  to 


7r2L2M/„ 

2X2  cos2  0B 


(3) 
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From  relation  (3)  we  see  that  the  diffraction  efficiency 
is  approximately  linearly  proportional  to  the  acoustic 
intensity  /„.  Thus  modulation  of  the  acoustic  intensi¬ 
ty  results  in  modulation  of  the  diffracted  light  intensi¬ 
ty.  Also,  relation  (4)  shows  that  the  deflection  angle 
20/j  is  approximately  linearly  proportional  to  the 
acoustic  frequency. 

The  most  important  parameters  in  this  processor 
are  the  number  of  resolvable  beam  spots,  or,  in  other 
words,  the  dimension  of  the  matrices  that  can  be  mul¬ 
tiplied  and  the  computation  time  T.  In  addition,  the 
processor  must  operate  with  acceptable  efficiency. 

The  number  of  resolvable  beam  spots  is  given  by6 


N= 


4u  cos  0H 


(5) 


where  A /  is  the  bandwidth  of  the  acoustic  wave  around 
the  center  frequency  fo  and  D  is  the  diameter  of  the 
illuminating  beam  measured  along  the  direction  of 
acoustic-wave  propagation.  Thus  the  basic  parame¬ 
ter  that  determines  the  dimension  N  is  A f,  and  it  must 
satisfy  the  two  conditions6 


A/</( 


/o> 

(6) 

2 nul  ' 

(7) 

XL/0 

A /< 


The  condition  of  relation  (6)  ensures  modulation  with¬ 
out  nonlinear  distortion,  while  relation  (7)  arises  from 
the  need  to  maintain  the  proper  Bragg  angle  between 
incident  light  and  acoustic  waves  in  the  AO  cell  (a 
phase-mismatch  condition).  These  conditions  also 
affect  the  thickness  L  of  the  AO  cell.  To  obtain  a  large 
N  a  small  L  is  desirable,  but  this  results  in  low  efficien¬ 
cy  [Eq.  (1)].  Thus  there  is  a  trade-off  in  assigning  a 
value  for  L.  This  problem  can  be  overcome  by  using  a 
beam-steering  technique.8 

We  now  consider  a  practical  example  and  evaluate 
the  dimension  N,  the  computation  time  T,  and  the 
diffraction  efficiency  i?  for  the  system  shown  in  Fig.  1. 
Let  the  medium  of  the  AO  cells  be  PbMoO^,  which  has 
the  parameters  n  =  2.4,  n  =  3.75  km/sec,  and  M  =  7.3  X 


10  14  m2/W.  Choosing  X  =  0.5145  /im,  fn  =  100  MHz, 
A f  =  100  MHz,  /„  =  250  mW/cm2,  L  =  1  cm,  and  D  =  3 
mm,  we  obtain  N  =  60,  T  =  60psec,  and  >7  =  0.34.  If  we 
use  instead  the  AO  cell  array  B  shoum  in  Fig.  3  in  the 
system  of  Fig.  1,  the  computation  time  is  1  ^sec. 
These  results  show  that  the  processor  is  very  fast  and 
operates  with  acceptable  efficiency. 

A  final  consideration  concerns  the  phase  mismatch 
of  the  AO  cells.  We  have  assumed  that  the  beams 
incident  upon  the  cells  arrive  at  the  correct  angle  for 
Bragg  diffraction.  While  this  is  true  for  AO  cell  array 
A  (Fig.  1),  it  is  not  generally  true  for  AO  cell  B.  In 
fact,  the  beams  incident  upon  cell  B  necessarily  arrive 
at  different  angles  because  they  are  the  output  beams 
of  the  AO  cell  array  A  and  cannot  arrive  at  the  correct 
Bragg  angle.  This  phase  mismatch  results  in  a  de¬ 
crease  in  the  number  of  resolvable  spots,  although  it 
may  be  possible  to  reduce  the  effects  of  the  phase 
mismatch  by  the  use  of  correcting  lenses  or  lens  arrays. 

In  this  Letter  we  have  presented  a  matrix-matrix 
multiplier.  Its  operation  is  based  on  the  modulation 
and  deflection  properties  of  AO  cells.  It  is  fast  be¬ 
cause  of  its  parallel  architecture.  The  dimension  of 
the  matrices  that  it  can  multiply  is  sufficiently  large. 
This  processor  can  be  used  for  implementations  of 
algorithms  that  require  matrix-matrix  multiplica¬ 
tions  such  as  LU  decomposition,  QR  decomposition, 
direct  solution  of  linear  equations,  and  Kalman  filter¬ 
ing.3 
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